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Preface 


The book carefully and diligently covers all three aspects related to the teaching of digital circuits: digital 
principles, digital electronics, and digital design. The starting point was the adoption of some fundamental 
premises, which led to a detailed and coherent sequence of contents. Such premises are summarized below. 


Book Premises 


The text is divided into two parts, with the theory in Chapters 1-18 and the lab components in 
Chapters 19-25 plus Appendices A and B. These parts can be taught in parallel if it is a course with 
lectures and lab, or they can be used separately if it is a lecture-only or lab-only course. 


The book provides a clear and rigorous distinction between combinational circuits and sequential 
circuits. In the case of combinational circuits, further distinction between logic circuits and arithmetic 
circuits is provided. In the case of sequential circuits, further distinction between regular designs 
and state-machine-based designs is made. 


The book includes new, modern digital techniques, related, for example, to code types and data protection 
used in data storage and data transmission, with emphasis especially on Internet-based applications. 


The circuit analysis also includes transistor-level descriptions (not only gate-level), thus providing 
an introduction to VLSI design, indispensable in modern digital courses. 


A description of new, modern technologies employed in the fabrication of transistors (both bipolar 
and MOSFET) is provided. The fabrication of memory chips, including promising new approaches 
under investigation, is also presented. 


The book describes programmable logic devices, including a historical review and also details 
regarding state of the art CPLD/FPGA chips. 


Examples and exercises are named to ease the identification of the circuit/design under analysis. 


m@ Not only are VHDL synthesis examples included in the experimental part, but it also includes 


a summary of the VHDL language, a chapter on simulation with VHDL testbenches, and also a 
chapter on simulation with SPICE. 


Finally, a large number of complete experimental examples are included, constructed in a rigorous, 
detailed fashion, including real-world applications, complete code (not only partial sketches), synthesis 
of all circuits onto CPLD/FPGA chips, simulation results, and general explanatory comments. 


Book Contents 


The book can be divided into two parts, with the theory (lectures) in Chapters 1-18 and experimentations 
(laboratory) in Chapters 19-25 plus Appendices A and B. Each of these parts can be further divided as follows. 


Part I Theory (Lectures) 
a Fundamentals: Chapters 1-5 


a Advanced fundamentals: Chapters 6-7 
xix 


xx Preface 


a Technology: Chapters 8-10 
a Circuit design: Chapters 11-15 
a Additional technology: Chapters 16-18 


m Part I] Experiments (Laboratory) 
a VHDL summary: Chapter 19 
a VHDL synthesis: Chapters 20-23 
a VHDL simulation: Chapter 24 and Appendix A 
a SPICE simulation: Chapter 25 and Appendix B 


The book contains 163 enumerated examples, 622 figures, and 545 exercises. 


Audience 


This book addresses the specific needs of undergraduate and graduate students in electrical engineering, 
computer engineering, and computer science. 


Suggestions on How to Use the Book 


The tables below present suggestions for the lecture and lab sections. If it is a lecture-only course, then 
any of the three compositions in the first table can be employed, depending on the desired course level. 
Likewise, if it is a lab-only course, then any of the three options suggested in the second table can be 
used. In the more general case (lectures plus lab), the two parts should be taught in parallel. In the tables 
an ‘x’ means full content, a slash ‘/’ indicates a partial (introductory sections only) content, and a blank 
means that the chapter should be skipped. These, however, are just suggestions based on the author’s 
own experience, so they should serve only as a general reference. 


Theory Chapters 

Lecture Level 1. | 42 3 4 5 6 7 8 9 40). | 99) | 425) AS: 14] 15)] AS: | At 1-18 
Fundamental x x x x x / / / x | x x x x / / x 
Intermediate x | X x x x x / x x x x x x x x x x 
Advanced x x x x x x x x x x x x x x x x x x 
Practice Chapters and Appendices 

Lab Level 19 | 20 | 21 | 22 | 23 | 24| 25/ A B 

Fundamental x x x x x 

Intermediate x x x x x x x 


Advanced x x x x x x x x x 


Preface xxi 


Companion Web Site and Contacts 


Book Web site: books.elsevier.com/ companions /9780123742704. 
Author’s email: Please consult the Web site above. 
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Introduction 


Objective: This chapter introduces general notions about the digital electronics field. It also explains 
some fundamental concepts that will be useful in many succeeding chapters. In summary, it sets the 
environment for the digital circuit analysis and designs that follow. 


Chapter Contents 


1.1 Historical Notes 

1.2 Analog versus Digital 

1.3. Bits, Bytes, and Words 

1.4 Digital Circuits 

1.5 Combinational Circuits versus Sequential Circuits 
1.6 Integrated Circuits 

1.7 Printed Circuit Boards 

1.8 Logic Values versus Physical Values 

1.9 Nonprogrammable, Programmable, and Hardware Programmable 
1.10 Binary Waveforms 

1.11 DC, AC, and Transient Responses 

1.12 Programmable Logic Devices 

1.13. Circuit Synthesis and Simulation with VHDL 

1.14 Circuit Simulation with Spice 

1.15 Gate-Level versus Transistor-Level Analysis 


1.1. Historical Notes 


The modern era of electronics started with the invention of the transistor by William Shockley, John 
Bardeen, and Walter Brattain at Bell Laboratories (Murray Hill, New Jersey) in 1947. A partial picture 
of the original experiment is shown in Figure 1.1(a), and a popular commercial package, made out of 
plastic and called TO-92, is depicted in Figure 1.1(b). Before that, electronic circuits were constructed 
with vacuum tubes (Figure 1.1(c)), which were large (almost the size of a household light bulb), slow, 
and required high voltage and high power. 

The first transistor was called a point-contact transistor because it consisted of two gold foils whose tips 
were pressed against a piece of germanium. This can be observed in Figure 1.1(a) where a wedge with 
a gold foil glued on each side and slightly separated at the bottom is pressed against a germanium slab. 
Any transistor has three terminals (see Figure 1.1(b)), which, in the case of Figure 1.1(a), correspond to 
the gold foils (called emitter and collector) and the germanium slab (called base). 

Germanium was later replaced with silicon, which is cheaper, easier to process, and presents electrical 
properties that are also adequate for electronic devices. 
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FIGURE 1.1. (a) The first transistor (1947); (b) A popular commercial encasing (called TO-92); (c) A vacuum tube. 


Despite the large acclaim of the transistor invention, which led the three authors to eventually share 
the Nobel Prize in Physics in 1956, Shockley was very dissatisfied with Bell Laboratories (among other 
reasons because his name was not included in the transistor patent because the final experiment was 
conducted without his participation). So he eventually left to start his own company, initially intending 
to mass produce low-cost transistors. 


The birth of silicon valley (or “Had Shockley gone elsewhere ...”) 


The most common material used in electronic devices is Si (silicon), followed by GaAs (gallium arsenide), 
Ge (germanium), and others (all used much less frequently than Si). However, in their original form, 
these materials are of very little interest. What makes them useful is a process called doping, which 
consists of adding an impurity (called dopant) to them that creates either free electrons or free holes (the 
latter means a “space” where an electron is missing; the space does not have a fixed position, causing it to 
be controlled by an external electrical field that results in an electric current). Depending on whether the 
dopant generates free electrons (negative charges) or free holes (positive charges), the doped semiconductor 
is classified as n-type or p-type, respectively. For Si, popular n-type dopants are P (phosphorous) and As 
(arsenic), while p-type dopants include B (boron) and Al (aluminum). 

The point-contact approach used in the first transistor was not adequate for mass production, so 
Shockley diligently searched for another approach, eventually leading to a very thin region (base) of 
type n (or p) sandwiched between two other regions (emitter and collector) of type p (or n). Because of 
its two-junction construction, this type of transistor is called BJT (bipolar junction transistor). 

Shockley did his undergraduate studies at the California Institute of Technology and earned his PhD 
at MIT. In 1955 he returned to the West Coast to set up his company, Shockley Semiconductor, in Moun- 
tain View, California, which is south of San Francisco. 

The subsequent events are nicely described by Gordon Moore in a speech he gave at the ground- 
breaking ceremony for the new engineering building at Caltech in 1994 ([Moore94], available at the 
Caltech Archives; also available at http://nobelprize.org under the title “The Accidental Entrepre- 
neur”). Very briefly, the events are as follows. Because of Shockley’s difficult personality, eight of his 
employees, including Gordon Moore and Robert Noyce, decided to leave Shockley Semiconductor in 
1957 and start a new company, called Fairchild Semiconductor, with a little capital of their own and the 
bulk of the financing from Fairchild Camera, an East Coast corporation. The new company, like many 
other spin-offs that followed, established itself in the same region as Shockley’s company. Fairchild 
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Semiconductor turned out to be a very successful enterprise with nearly 30,000 employees after a little 
over 10 years. However, due to management problems and other conflicts between the West Coast 
(Fairchild Semiconductor) and East Coast (Fairchild Camera) operations, in 1968 Moore and Noyce left 
and founded their own company, Intel. 

One of Intel’s first great achievements was the development of MOS transistors with polysilicon 
gates instead of metal gates, a technology that took several years for other companies to catch up with. 
The first integrated microprocessor, called 4004 and delivered in 1971, was also developed by Intel (see 
Figure 1.2(a)). It was a 4-bit processor with nearly 2300 transistors that was capable of addressing 9.2 k of 
external memory, employed mainly in calculators. Even though Intel’s major business in the 1980s was 
the fabrication of SRAM and DRAM memories, its turning point was the advent of personal computers, 
for which Intel still manufactures most of the microprocessors (like that shown in Figure 1.2(b)). 

Even though the development of the first integrated circuit in 1958 is credited to Robert Noyce (while 
still at Fairchild) and Jack Kilby (working independently at Texas Instruments), the 2000 Nobel Prize in 
Physics for that development was awarded only to the latter. 

In summary, many spin-offs occurred after Shockley first decided to establish his short-lived com- 
pany south of San Francisco (Shockley went on to become a professor at Stanford University). Because 
most of these companies dealt with silicon or silicon-related technologies, the area was coined “Silicon 
Valley” by a journalist in 1971, a nickname that rapidly became well known worldwide. So, one might 
wonder, “Had Shockley decided to go elsewhere...” 

But not only of memories and microprocessors is electronics made. There are many other companies 
that specialize in all sorts of electronic devices. For example, some specialize in analog devices, from 
basic applications (operational amplifiers, voltage regulators, etc.) to very advanced ones (wireless links, 
medical implants and instrumentation, satellite communication transceivers, etc.). There are also compa- 
nies that act in quite different parts of the digital field, like those that manufacture PLDs (programmable 
logic devices), which constitute a fast-growing segment for the implementation of complex systems. As 
a result, chips containing whole systems and millions of transistors are now commonplace. 

Indeed, today’s electronic complexity is so vast that probably no company or segment can claim 
that it is the most important or the most crucial because none can cover alone even a fraction of what is 
being done. Moreover, companies are now spread all over the world, so the actual contributions come 
from all kinds of places, people, and cultures. In fact, of all aspects that characterize modern electronic 
technologies, this worldwide congregation of people and cultures is probably what best represents its 
beauty. 


(a) 


FIGURE 1.2. (a) Intel 4004, the first microprocessor (1971, 10 4m nMOS technology, ~2300 transistors, 108 kHz); 
(b) Pentium 4 microprocessor (2006, 90nm CMOS technology, > 3 GHz, 180 million transistors). (Reprinted with 
permission of Intel.) 
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1.2. Analog versus Digital 


Electronic circuits can be divided into two large groups called analog and digital. The first deals with 
continuous-valued signals while the second concerns discrete-valued signals. Roughly speaking, the former 
deals with real numbers while the latter deals with integers. 

Many quantities are continuous by nature, like temperature, sound intensity, and time. Others are 
inherently discrete, like a game’s score, the day of the month, or a corporation’s profit. Another example 
is a light switch, which has only two discrete states (digital) versus a light dimmer, which has innumerous 
continuous states (analog). 

From a computational point of view, however, any signal can be treated as digital. This is made pos- 
sible by a circuit called analog- to-digital converter (A/DC), which converts the analog signal into digital, 
and by its counterpart, the digital-to-analog converter (D/AC), which reconverts the signal to its original 
analog form when necessary. 

This process is illustrated in Figure 1.3. A sample and hold (S&H) circuit periodically samples the 
incoming signal, providing static values for the A/DC, which quantizes and represents them by means 
of bits (bits and bytes will be defined in the next section). At the output of the digital system, binary val- 
ues are delivered to the D/AC, which converts them into analog but discontinuous values, thus requiring 
a low-pass filter (LPF) to remove the high frequency components, thus “rounding” the signal’s corners 
and returning it approximately to its original form. 

To illustrate the importance of this method, consider the recording and playing of music. The music 
captured by the microphones in a recording studio is analog and must be delivered in analog form to the 
human ear. However, in digital form, storage is easier, cheaper, and more versatile. Moreover, the music 
can be processed (filtered, mixed, superimposed, etc.) in so many ways that would be simply impossible 
otherwise. For those reasons, the captured sound is immediately converted from analog to digital by the 
A/DC, then processed, and finally recorded on a CD. The CD player does the opposite, that is, it reads 
the digital information from the CD, processes it, then passes it through the D/AC circuit, and finally 
amplifies the analog (reconstituted) signal for proper loudspeaker reproduction. 

Even though the introduction of quantization errors in the conversion /deconversion process described 
above is inevitable, it is still viable because a large enough number of bits can be employed in it so that 
the resulting error becomes too small to be perceived or too small to be relevant. As an example, let us 
say that the analog signal to be converted ranges from 0 V to 1 V and that 8 bits are used to encode it. In 
this case, 2°=256 discrete values are allowed, so the analog signal can be divided into 256 intervals of 
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FIGURE 1.3. Interface between a digital system and the analog world. 
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3.9mV each (because 1V/256=3.9mV) with a binary word used to represent each interval. One option 
for the encoding would then be (0 V to 3.9 mV) ="00000000", (3.9mV to 7.8 mV) ="00000001",..., (996.1 mV 
to 1 V)="11111111". More bits can be used, and other encoding schemes also exist, like the use of nonuni- 
form intervals, so the maximum error can be tailored to meet specific applications. In standard digital 
music for CDs, for example, 16 bits are employed in each channel. 

Another important aspect in analog/digital (A/D) conversion is the sampling rate, which is the num- 
ber of times the incoming signal is sampled per second (see the S&H stage in Figure 1.3). The Nyquist 
theorem determines that it has to be greater than twice the signal’s largest frequency. In the case of stan- 
dard digital music, the rate is 44.1 ksamples/s, which therefore allows the capture of signals over 20 kHz. 
This is enough because the human ear can detect audio signals from 50 Hz to approximately 20 kHz. 


1.3 Bits, Bytes, and Words 


Even though multilevel logic has been investigated for a long time, two-level logic (called binary) is still 
more feasible. Each component is called a bit, and its two possible values are represented by '0' and '1'. 
Even though the actual (physical) signals that correspond to '0' and '1' are of fundamental importance to 
the technology developers, they are irrelevant to the system users (to a programmer, for example). 

While a single '0' or '1' is called a bit, a group of 4 bits is called a nibble, a group of 8 bits is called a byte, 
a group of 16 bits is called a word, and a group of 32 bits is called a Jong word. In the VHDL language 
(Chapters 19-24), the syntax is that a single bit has a pair of single quotation marks around it, such as '0' 
or '1', while a group of bits (called a bit vector) has a pair of double quotation marks, such as "00010011". 
This syntax will be adopted in the entire text. 

The leftmost bit of a bit vector is normally referred to as the most significant bit (MSB), while the 
rightmost one is called least significant bit (LSB). The reason for such designations can be observed in 
Figure 1.4; to convert a binary value into a decimal value, each bit must be multiplied by 2‘, where k 
is the bit’s position in the codeword from right to left (so the right end has the lowest weight and the 
left end has the highest). For example, the decimal value corresponding to "10011001" (Figure 1.4) is 153 
because 1-27+0-2°+0-2°+1-24+1-29+0-2?+0-2'+1-29=153. 

A popular set of codewords is the ASCII (American Standard Code for Information Interchange) 
code, which is employed to represent characters. It contains 128 7-bit codewords, which are listed in 
Figure 1.5. To encode the word “bit,” for example, the following sequence of bits would be produced: 
"1000010 1001001 1010100". 

In summary, bits are used for passing information between digital circuits. In other words, they con- 
stitute the language with which digital circuits communicate. For example, the clock on a microwave 
oven is digital, and it must light each digit in the proper order during the right amount of time. For it to 
work, a clock generator must create a time base, which it communicates to the next circuit (the decoder/ 
driver), where proper signals for driving the digits are created, and which are then communicated to the 
final unit (the display) where a proper representation for time is finally created. These communications 
consist exclusively of bits. 
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FIGURE 1.4. A byte and the weights employed to convert it into a decimal number. 
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FIGURE 1.5. ASCII code. 


1.4 Digital Circuits 


Each digital circuit can be described by a binary function, which is ultimately how it processes the bits 
that it receives. Say, for example, that a and D are two bits received by a certain circuit, which produces 
bit y at the output. Below are some examples of very popular binary functions. 


y=NOT a (also represented by y=a') 


y=a OR b (or y=atb, where “+” represents logical OR; not to be confused with the mathematical 
summation sign for addition) 


y=a AND b (or y=a-b, where “-” represents logical AND; not to be confused with the mathematical 
product sign for multiplication) 


The first function (y=NOT a) is called inversion or negation because y is the opposite of a (that is, if 
a='0', then y='1', and vice versa). The second (y=a OR b) is called OR function because it suffices to have 
one input high for the output to be high. Finally, the third function (y=a AND b) is called AND function 
because the output is high only when both inputs are high. Circuits that implement such basic functions 
are called gates, and they are named in accordance with the function that they implement (OR, AND, 
NOR, NAND, etc.). 

There are several ways of representing digital circuits, which depend on the intended level of abstrac- 
tion. Such levels are illustrated in Figure 1.6, where transistor-level is the lowest and system-level is the 
highest. 

When using a transistor-level description, elementary components (transistors, diodes, 
resistors, capacitors, etc.) are explicitly shown in the schematics. In many cases, transistor-level 
circuits can be broken into several parts, each forming a gate (OR, AND, etc.). If gates are used 
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FIGURE 1.6. Representation of digital circuits according to the level of abstraction. The lowest level is the 
transistor-level, followed by gate-level representation, all the way up to a complete device. 


as the lowest level of abstraction, it is said to be a gate-level description for the design. After 
this point, nonstandard blocks (subsystems) are normally employed, which are collections of 
gate-level blocks that the designer creates to ease the visualization of the whole system. This is 
called subsystem-level description. Finally, by interconnecting the subsystem blocks, the complete 
system can be represented (system-level representation). At the top of Figure 1.6, an integrated 
circuit (IC) is shown, which is one of the alternatives that might be considered when physically 
implementing the design. 

The most fundamental logic gates are depicted in Figure 1.7. Each has a name, a symbol, and a truth 
table (a truth table is simply a numeric translation of the gate’s binary function). The corresponding 
binary functions are listed below (the functions for three of them have already been given). 
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Inverter: Performs logical inversion. 

y=a' ory=NOTa 

Buffer: Provides just the necessary currents and/or voltages at the output. 

y=a 

AND: Performs logical multiplication. 

y=a-b ory=a ANDb 

NAND: Produces inverted logical multiplication. 

y=(a-b)' or y=NOT (a AND D) 

OR: Performs logical addition. 

y=a+b or y=a OR} 

NOR: Produces inverted logical addition. 

y=(a+b)' or y=NOT (a OR B) 

XOR: The output is '1' when the number of inputs that are '1' is odd. 

y=aQ®b, y=a XOR B, or y=a-b' +a'-b 

XNOR: The output is '1' when the number of inputs that are '1' is even. 

y=(a@b)', y=a XNOR b, y= NOT (a XOR b), or y=a'-b'+a-b 

The interpretation of the truth tables is straightforward. Take the AND function, for example; the 
output is '1' only when all inputs are '1', which is exactly what the corresponding binary function says. 

The circuits described above, collectively called digital gates, have a very important point in common: 
None of them exhibits memory. In other words, the output depends solely on the current values of the 


inputs. Another group of elementary circuits, collectively called digital registers, is characterized by the oppo- 
site fact, that is, all have memory. Therefore, the output of such circuits depends on previous circuit states. 
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FIGURE 1.7. Fundamental logic gates (name, symbol, and truth table). 
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Of all types of digital registers, the most commonly used is the D-type flip-flop (DFF), whose symbol 
and truth table are depicted in Figure 1.8. The circuit has two inputs, called d (data) and clk (clock), and 
two outputs, called q and q' (where q' is the complement of q). Its operation can be summarized as fol- 
lows. Every time the clock changes from '0' to '1' (positive clock edge), the value of d is copied to q; during 
the rest of the time, g simply holds its value. In other words, the circuit is “transparent” at the moment 
when a positive edge occurs in the clock (represented by q*=d in the truth table, where q* indicates the 
circuit’s next state), and it is “opaque” at all other times (that is, q*=q). 

Two important conclusions can be derived from Figure 1.8. First, the circuit does indeed exhibit 
memory because it holds its state until another clock edge (of proper polarity) occurs. Second, registers 
are clocked, that is, need a signal, to control the sequence of events. 

As will be described in detail in succeeding chapters, registers allow the construction of innumerous 
types of digital circuits. As an illustration, Figure 1.9 shows the use of a single DFF to construct a divide- 
by-2 frequency divider. All that is needed is to connect an inverted version of q back to the circuit's 
input. 

The circuit of Figure 1.9 operates as follows. The clock signal (a square wave that controls the whole 
sequence of events), shown in the upper plot of the timing diagram, is applied to the circuit. Because this 
is a positive-edge DFF, arrows are included in the clock waveform to highlight the only points where 
the DFF is transparent. The circuit's initial state was assumed to be q='0', so d='1', which is copied to q 
at the next positive clock edge, producing q='1' (after a little time delay, needed for the signal to traverse 
the flip-flop). The new value of gq now produces d ='0' (which also takes a little time to propagate through 
the inverter). Then, at the next positive clock transition, d is again copied to q, this time producing q='0' 
and so on. Comparing the waveforms for clk and q, we observe that indeed the frequency of the latter is 
one-half that of the former. 

In summary, the importance of registers is that they allow the construction of sequential logic circuits 
(defined below). 


FIGURE 1.8. Symbol and truth table for a positive-edge triggered D-type flip-flop. 


FIGURE 1.9. Application of a register (DFF) in the construction of a frequency divider. 
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1.5 Combinational Circuits versus 
Sequential Circuits 


Throughout this book, a rigorous distinction is made between combinational logic circuits and sequential 
logic circuits. This is because distinct analysis and design techniques can (and should) be adopted. 

By definition, a combinational circuit is one in which the output depends solely on its present 
input, while a sequential circuit is one in which the output depends also (or only) on previous system 
states. Consequently, the former is memoryless, while the latter requires storage elements (normally 
flip-flops). 

The gates seen in Figure 1.7 are examples of combinational circuits, while the frequency divider of 
Figure 1.9 is an example of a sequential circuit. The clock on the microwave oven mentioned earlier is 
also a sequential circuit because its next state depends on its present state. On the other hand, when we 
press the “+” sign on a calculator, we perform a combinational operation because the result is not affected 
by previous operations. However, if we accumulate the sum, the circuit operates in a sequential fashion 
because now the result is affected by previous sums. 

One important point to observe, however, is that not all circuits that posses memory are sequen- 
tial. For example, a regular computer memory (SRAM or DRAM), from a memory-read perspective, is 
indeed a combinational circuit because a data retrieval is not affected by previous data retrievals. 

To conclude, it is important to mention that digital circuits can be also classified as logical and arithmetic 
depending on the type of function they implement. For example, an AND gate (logical multiplier) is an 
example of a logical circuit, while a regular (arithmetic) multiplier is an example of an arithmetic circuit. 


1.6 Integrated Circuits 


Digital ICs (also referred to as “chips”) are constructed with transistors. Because there are two 
fundamental types of transistors, called bipolar junction transistor (BJT, Chapter 8) and metal oxide 
semiconductor field effect transistor (MOSFET, or MOS transistor, Chapter 9), digital ICs can be classi- 
fied as BJT-based or MOS-based. Moreover, the power-supply voltage used to bias such circuits (that 
is, to provide the energy needed for their operation) is called Vcc in the former and Vpp in the latter. 
These parameters are very important because the lower they are, the less power the circuit consumes 
(recall that power is proportional to V). Typical values for these parameters will be presented in 
the next section, but briefly speaking they go from 5V (old BJT- and MOS-based chips) down to 1V 
(newest MOS-based chips). However, the supply voltage is not the only factor that affects the power 
consumption; another fundamental factor is the dynamic current (that is, the current that depends on 
the speed at which the circuit is operating), which is particularly important in MOS-based circuits. 

For any digital architecture to be of practical interest, it must be “integrateable”; that is, it must allow 
the construction of very dense (millions of gates) ICs with adequate electrical parameters, manageable 
power consumption, and reasonable cost. As will be described in Chapter 10, only after the develop- 
ment of the TTL (transistor-transistor logic) family in the 1970s, digital integration became viable, giving 
origin to the very successful 74-series of BJT-based logic ICs (now almost obsolete), which operates with 
Voc=5V. 

Starting in the late 1970s, MOS-based ICs began to gradually replace BJT-based circuits. The main 
reasons for that are the much smaller silicon space required by the former and especially their much 
lower power consumption. Indeed, the main MOS-based logic family, called CMOS (which stands 
for complementary MOS because the gates are constructed with a combination of n- and p-type MOS 
transistors), exhibits the lowest power consumption of all digital families. 
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CMOS technology is normally referred to by using the smallest dimension that can be fabricated 
(shortest transistor channel, for example), also called technology node, and it is expressed in micrometers 
or nanometers. This parameter was 8 ym in the beginning of the 1970s, and it is now just 65nm, with 
45 nm devices already tested and expected to be shipped in 2008. For example, the first integrated micro- 
processor (Intel 4004, mentioned earlier) was delivered in 1971 using 10 1m nMOS technology. 

Currently, all digital ICs are fabricated with MOS transistors, with BJTs reserved for only very specific 
applications, like ECL and BiCMOS logic (both described in Chapter 10). In analog applications (like 
radio-frequency circuits for wireless communication), however, the BJT still is a major contender. 

Examples of digital chips fabricated using 65nm CMOS technology include the top performance 
FPGAs (field programmable gate arrays) Virtex 5 (from Xilinx) and Stratix III (from Altera), both described 
in Chapter 18. These devices can have millions of transistors, over 200,000 flip-flops, and over 1000 user 
I/O pins. Indeed, the number of pins in digital ICs ranges from 8 to nearly 2000. 

Current ICs are offered in a large variety of packages whose main purposes are to provide the nec- 
essary heat dissipation and also the number of pins needed. Such packages are identified by standardized 
names, with some examples illustrated in Figure 1.10, which include the following: 


DIP: Dual in-line package 

PLCC: Plastic leaded chip carrier 

LQFP: Low-profile quad flat pack 

TOQFP: Thin quad flat pack 

PQFP: Plastic quad flat pack 

FBGA: Fine-pitch ball grid array 

FBGA Flip-Chip: FBGA constructed with flip-chip technology (the chip is “folded”) 


PGA2 Flip-Chip: Pin grid array constructed with flip-chip technology and cooler incorporated into 
the package 


In Figure 1.10, the typical minimum and maximum numbers of pins for each package are also given. 
Note that when this number is not too large (typically under 300), the pins can be located on the sides 
of the IC (upper two rows of Figure 1.10). However, for larger packages, the pins are located under 
the chip. 

In the latter case, two main approaches exist. The first is called BGA (ball grid array), which consists 
of small spheres that are soldered to the printed circuit board; BGA-based packages can be observed in 
the third row of Figure 1.10. The other approach is called PGA (pin grid array) and consists of an array of 
pins instead of spheres, which can be observed in the last row of Figure 1.10, where the top and bottom 
views of one of Intel’s Pentium 4 microprocessors are shown. 

Finally, note in Figure 1.10 that most packages do not require through holes in the printed circuit board 
because they are soldered directly on the copper stripes, a technique called SMD (surface mount device). 


1.7 Printed Circuit Boards 


A printed circuit board (PCB; Figure 1.11(a)) is a thin board of insulating material with a copper layer 
deposited on one or both sides on which electronic devices (ICs, capacitors, resistors, diodes, etc.) and 
other components (connectors, switches, etc.) are soldered. These devices communicate with each other 
through wires that result after the copper layers are properly etched. Figure 1.11(b) shows examples of 
PCBs after the devices have been installed (soldered). 
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DIP 16 PLCC 44 LOFP 64 
(8-64 pins) (20-84 pins) (32-208 pins) 


TQFP 100 
PQFP 128 (32-144 pins) 
(44-240 pins) 


FBGA 128 
(~100-1000 pins) 
FBGA Flip-Chip 672 
(~400-2000 pins) 


PGA2 Flip-Chip 478 
Intel Pentium 4 processor (top and bottom views) 


FIGURE 1.10. Examples of IC packages, each accompanied by name, number of pins, and typical range of pins 
(between parentheses) for that package type. When the number of pins is not too large (< 300), they can be 
located on the sides of the IC, while in larger packages they are located under the chip. In the latter, two main 
approaches exist, called BGA (ball grid array, shown in the third row) and PGA (pin grid array, shown in the 
last row). Pentium 4 microprocessor reprinted with permission of Intel. 
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(a) 


FIGURE 1.11. (a) A PCB; (b) Examples of assembled PCBs. 


The most common material used in the fabrication of PCBs is called FR-4 (flame resistant category 4), 
which is a woven fiberglass mat, reinforced with an epoxy resin with a greenish appearance. In summary, 
a common PCB is fiberglass, resin, and copper. 

The wires that are created on the PCB after the copper layer is etched are called traces, which can 
be very narrow and close to each other. Standard processes require a minimum width and spacing of 
0.25mm (or 10 mils; one mil is one-thousandth of an inch). More advanced processes are capable of 
handling traces with width and spacing as low as 0.1mm (4mils) and holes with a minimum diameter 
of 0.1mm. 

When ICs with a large number of pins are used, multilayer PCBs are normally required to provide 
sufficient interconnections (wires). In that case, the PCB is fabricated with several sheets glued on top 
of each other with the total number of layers commonly ranging between two and eight, though many 
more layers (greater than 30) can also be manufactured. 

The standard thickness of a single-layer PCB is 1.6mm (1/16in.), but thinner boards also exist. For 
example, individual layers in a multilayer PCB can be thinner than 0.3mm. 


1.8 Logic Values versus Physical Values 


We know that the digital values in binary logic are represented by '0' and '1'. But in the actual circuits, 
they must be represented by physical signals (normally voltages, though in some cases currents are also 
used) with measurable magnitudes. So what are these values? 

First, let us establish the so-called reference physical values for '0' and '1'. The power-supply voltages 
for circuits constructed with bipolar junction transistors (Chapter 8) are Vcc (a constant positive value) 
and GND (ground, 0 volts). Likewise, for circuits constructed with MOS transistors (Chapter 9), they 
are Vpp (a constant positive value) and GND. These are generally the reference values for '1' and '0', that 
is, ‘l'=V¢c or '1'= Vpp and '0'=GND. This means that when a '0' or '1' must be applied to the circuit, a 
simple jumper to GND or Vcc/Vpp, respectively, can be made. 

Now let us describe the signal values for '0' and. '1'. In this case, '0's and '1's are not provided by jumpers 
to the power-supply rails but normally by the output of a preceding gate, as illustrated in Figure 1.12(a). 
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Vor represents the gate’s maximum output voltage when low, Vo; represents its minimum output volt- 
age when high, V;, is the maximum input voltage guaranteed to be interpreted as '0', and finally Vj; is 
the minimum input voltage guaranteed to be interpreted as '1'. In this particular example, the old 5V 
HC CMOS family, employed in the 74-series of digital ICs (described in Chapter 10), is depicted, which 
exhibits Vo, =0.26 V, Voy =4.48 V, Vi_=1 V, and Vyy=3.5 V (at 25°C and In= |4mA]). 

The first conclusion from Figure 1.12(a) is that the physical values for '0' and '1' are not values, but 
ranges of values. The second conclusion is that they are not as good as the reference values (indeed, those 
are the best-case values). 

Another very important piece of information extracted from these parameters is the family’s noise 
margin. In the case of Figure 1.12(a), the following noise margins result when low and when high: 
NM, = Vy,.- Voy = 9-74 V, NMG3= Voru— Viz = 0-98 V (these values are listed in Figure 1.12(b)). Hence, we 
conclude that when using ICs from this family, any noise whose peak amplitude is under 0.74 V is guar- 
anteed not to corrupt the data. 

To conclude, Figure 1.12(c) shows the same kind of information for the complete LVCMOS (low-voltage 
CMOS) series of standard I/Os, which are among the most popular in modern designs, and will be described 
in detail in Chapter 10. As can be seen, the supply voltages are (from older to newer) 3.3V, 2.5V, 1.8V, 1.5V, 
1.2V, and 1V. 


> [>> 5V HC @4mA, 25°C 
Vop=5V 


Output Vot 0.26V 
4.48V 


VoH=4.4BV eeeeiteesseeceenscegnee 


Vin=3.5V 
rata Vy=1V | NMa | 0.98V 
VoL = 0.26 —pctresscrrnseeten es 5V HC CMOS 
@4mA, 25°C (b) 
GND 
(a) 
3.3V LVCMOS 
@0.1mA 
2.5V LVCMOS (c) 
3.1V 1.8V LVCMOS 
@ 2mA 1.5V LVCMOS 


1V LVCMOS 


1.2V LVCMOS 
2mA 
e @ 2mA 


FIGURE 1.12. (a) Analysis of the physical values for '0' and '1', which are ranges of values; (b) Min-max values 
of input/output parameters and corresponding noise margins for the old HC logic family; (c) Supply voltages 
and min-max input/output parameters for all standardized low-voltage CMOS logic families (drawn approxi- 
mately to scale). 
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1.9. Nonprogrammable, Programmable, 
and Hardware Programmable 


Another important separation that can be made between logic ICs pertains to their programmability, as 
described below. 


@ Nonprogrammable ICs: Integrated circuits with a fixed internal structure and no software-handling 
capability. This is the case, for example, of the 74 series mentioned earlier (TTL and HC families, 
described in detail in Chapter 10). 


m@ Programmable ICs: Integrated circuits with software-handling capability. Even though their physical 
structure is fixed, the tasks that they perform can be programmed. This is the case with all micropro- 
cessors, for example. 


m Hardware programmable ICs: Integrated circuits with programmable physical structures. In other 
words, the hardware can be changed. This is the case of CPLD/FPGA chips, which will be intro- 
duced in Section 1.12 and described in detail in Chapter 18. Any hardware-programmable IC can 
be a software-programmable IC, depending on how the hardware is configured (for example, it can 
be configured to emulate a microprocessor). 


In the beginning of the digital era, only the first category of ICs was available. Modern designs, however, 
fall almost invariably in the other two. The last category, in particular, allows the construction of very 
complex systems with many different units all on the same chip, a type of design often referred to as SoC 
(system-on-chip). 


1.10 Binary Waveforms 


Figure 1.13 shows three idealized representations for binary waveforms, where the signal x is assumed 
to be produced by a circuit that is controlled by clk (clock). 

The view shown in Figure 1.13(a) is completely idealized because, in practice, the transitions are not 
perfectly vertical. More importantly, because x depends on clk, some time delay between them is inevi- 
table, which was also neglected. Nevertheless, this type of representation is very common because it 
illustrates the circuit’s functional behavior. 

The plots in Figure 1.13(b) are a little more realistic because they take the propagation delay 
— account. The low-to-high propagation delay is called t,, 4, while the high-to-low is called 

HL: 

"A third representation is depicted in Figure 1.13(c), this time including the delay and also the fact 
that the transitions are not instantaneous (though represented in a linear way). The time delays are mea- 
sured at 50% of the logic voltages. 

In continuation, a nonidealized representation will be shown in the next section when describing the 
meaning of “transient response.” 

Another concept that will be seen several times is presented in Figure 1.14. It is called duty cycle, and 
it represents the fraction of time during which a signal remains high. In other words, duty cycle=T),/T, 
where T=T;,+T; is the signal’s period. For example, the clock in Figure 1.13 exhibits a duty cycle of 50%, 
which is indeed the general case for clock signals. 
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(a) 


clk 


(c) 


fis “tas 


FIGURE 1.13. Common idealized representations of binary waveforms. (a) Completely idealized view with 
vertical transitions and no time delays; (b) With time delays included; (c) With time delays and nonvertical 
transitions (the delays are measured at the midpoint between the two logic voltages). 


FIGURE 1.14. Illustration of duty cycle (7,,/T). 


1.11 DC, AC, and Transient Responses 


The signal produced at the output of a circuit when a certain stimulus is applied to its input is called 
circuit response. There are several kinds of such responses, which depend on the type of stimulus applied 
to the circuit. Together they allow a thorough characterization of the circuit performance. 

The two main types of responses for analog linear circuits are DC response and AC response. Simi- 
larly, the two main responses for digital circuits are DC response and transient response. Even though 
these types of behaviors will be discussed in later chapters and also in the simulation examples using 
SPICE, a brief description of each one follows. 
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DC response 


DC response is the response of a circuit to a large amplitude slowly varying stimulus. DC stands for direct 
current, meaning a constant electric current of voltage. The name “DC response” therefore indicates that 
each output value is measured for a fixed input value. In other words, the input signal is varied a little, 
then enough time is given for the output signal to completely settle, and only then are the measurements 
taken. During the tests, a large range is covered by the input signal. This type of analysis will be studied 
in Sections 8.4 (for BJT-based circuits) and 9.4 (for MOS-based circuits). 

An example of DC response is presented in Figure 1.15, which shows the voltage at the output of a 
CMOS inverter (Section 9.5) when its input is subject to a slowly varying voltage ranging from GND 
(OV) to Vpp (5V in this example). When the input is low, the output is high, and vice versa. However, 
there is a point somewhere between the two extremes where the circuit changes its condition. This volt- 
age is called transition voltage (Vrp) and is measured at the midpoint between the two logic voltages 
(GND and Vpp). In this example, Vp ~ 2.2 V. 


Transient response 


Transient response represents the response of a circuit to a large-amplitude fast-varying stimulus. This 
type of analysis, also called time response, will be seen several times in subsequent chapters, particularly 
in Sections 8.5 (for BJT-based circuits) and 9.6 (for MOS-based circuits). 

An example is shown in Figure 1.16. It is indeed a continuation of Figure 1.13, now with a more real- 
istic representation. The transient response is specified by means of a series of parameters whose defini- 
tions are presented below. 


t, (rise time): Time needed for the output to rise from 10% to 90% of its static values 
t; (fall time): Time needed for the output to fall from 90% to 10% of its static values 


totu (low-to-high propagation delay): Time delay between the input crossing 50% and the output cross- 
ing 50% when the output rises 

toHL (high-to-low propagation delay): Time delay between the input crossing 50% and the output cross- 
ing 50% when the output falls 

toy (turn-on delay): Time delay between the input crossing 10% and the output crossing 90% when 
the switch closes (rising edge when displaying current) 

t ge (Eurn-off delay): Time delay between the input crossing 90% and the output crossing 10% when the 
switch opens (falling edge when displaying current) 


Note in Figure 1.16 that the first two parameters, ft, and t;, represent Jocal measurements (they concern 
only one signal, the output), while the others are transfer parameters (they relate one side of the circuit 
to the other, that is, input-output). For that reason, the latter are more representative, so they are more 


Vop=5V 


FIGURE 1.15. DC response of a CMOS inverter. 
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input 


FIGURE 1.16. Transient response parameters. 


commonly used. For simple gates, however, tf; and f,44; are normally dominated by f, and f;, so typ 
af, / 2 and tapi be/ 2. 


AC response 


AC response is the response of a circuit to a small-amplitude sinusoidal stimulus whose frequency is 
swept between two limits. AC stands for alternate current, like the 60 Hz sinusoidal electric current (volt- 
age) available from traditional wall outlets. The name “AC response” therefore indicates that the input 
stimulus is sinusoidal, hence it is proper for testing linear analog circuits. Even though it is not related to 
digital circuits, a brief introduction to AC response will be given in Sections 8.6 (for BJT-based circuits) 
and 9.7 (for MOS-based circuits). 


1.12 Programmable Logic Devices 


CPLD (complex programmable logic device) and FPGA (field programmable gate array) chips play an 
increasingly important role in modern electronic design. As mentioned earlier, these chips exhibit a 
unique feature that consists of having hardware that is programmable. Consequently, they can literally 
implement any kind of digital circuit. 

Because of their very attractive features, like high gate and register count, wide range of I/O standards 
and supply voltages, large number of user I/O pins, easy ISP (in-system programming), high speed, 
decreasing cost, and particularly the short time to market and modifiability of products developed with 
such devices, their presence in modern, complex designs has grown substantially over the years. 

Additionally, the ample adoption of VHDL and Verilog in the engineering curriculum, plus the high 
quality and low cost of current synthesis and simulation tools, have also contributed enormously to the 
widespread use of such technology. 

These devices will be studied in detail in Chapter 18. However, just to illustrate their potentials, the 
table in Figure 1.17 summarizes the main features of two top-performance FPGAs (Xilinx Virtex 5 and 
Altera Stratix IIT). The technology used in both is 65 nm CMOS (which is the most advanced at the time 
of this writing), with a supply voltage in the 1V range. The number of equivalent logic gates is in the 
millions, and the number of flip-flops is over 200,000. They also provide a large amount of SRAM memory 
and DSP (digital signal processing—essentially multipliers and accumulators) blocks for user-defined 
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Feature Xilinx Virtex 5 (LX series) | Altera Stratix Ill (L series) | 
Technology CMOS 65nm (SRAM) CMOS 65nm (SRAM) 
Core voltage 1V 0.9V or 1.1V 
Number of CLBs (Virtex) 2,400 to25,920 = | wwennnnn nnn 

Number of LABs (Stratix) =|  ~—__—----------- 1,900 to 13,520 


Number of Slices (Virtex) 4,800 to51,840 «| anne nnn anne 
Number of ALMs (Stratix) | wenn nen neee 19,000 to 135,200 
Number of flip-flops 19,200 to 207,360 38,000 to 270,400 
Max. system clock frequency 550MHz 600MHz 
Embedded SRAM (bits) 1.47M to 13.8M 2.4M to 20.4M 
Number of DSP blocks 32 to 192 27 to 96 
Number of PLLs 2to6 4to 12 
Number of I/O pins 400 — 1,200 288 — 1,104 


FIGURE 1.17. Summary of features for two top performance FPGAs. 


applications along with several PLL (phase locked loop) circuits for clock filtration and multiplication. 
The number of user I/O pins can be over 1000. 


1.13 Circuit Synthesis and Simulation with VHDL 


Modern large digital systems are normally designed using a hardware description language like VHDL or 
Verilog. This type of language allows the circuit to be synthesized and fully simulated before any physi- 
cal implementation actually takes place. It also allows previously designed codes and IP (intellectual 
property) codes to be easily incorporated into new designs. 

Additionally, these languages are technology and vendor independent, so the codes are portable and 
reusable with different technologies. After the code has been written and simulated, it can be used, for 
example, to physically implement the intended circuit onto a CPLD/FPGA chip or to have a foundry 
fabricate a corresponding ASIC (application-specific integrated circuit). 

Due to the importance of VHDL, its strong presence in any digital design course is indispensable. For 
that reason, six chapters are dedicated to the matter. Chapter 19 summarizes the language itself, Chapters 
20 and 21 show design examples for combinational circuits, Chapters 22 and 23 show design examples for 
sequential circuits, and finally Chapter 24 introduces simulation techniques using VHDL testbenches. 

All design examples presented in the book were synthesized and simulated using Quartus IT Web 
Edition version 6.1 or higher, available free of charge at www.altera.com. The designs simulated using 
testbenches were processed with ModelSim-Altera Web Edition 6.1, also available free of charge at the 
same site. A tutorial on ModelSim is included in Appendix A. 


1.14 Circuit Simulation with SPICE 


SPICE (Simulation Program with Integrated Circuit Emphasis) is a very useful simulator for analog and 
mixed (analog-digital) circuits. It allows any circuit to be described using proper component models 
from which a very realistic behavior is determined. 

The SPICE language provides a means for modeling all sorts of electronic devices, including transis- 
tors, diodes, resistors, capacitors, etc., as well as common integrated circuits, which can be imported from 
specific libraries. All types of independent and dependent signal sources (square wave, sinusoid, piecewise 
linear, etc.) can be modeled as well, so complete circuits, with proper input stimuli, can be evaluated. 
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In the case of digital circuits, SPICE is particularly useful for testing the DC and transient responses 
(among others) of small units, like basic gates, registers, and standard cells. For that reason, Chapter 25 
is dedicated to the SPICE language, where several simulation examples are described. Additionally, a 
tutorial on PSpice, which is one of the most popular SPICE softwares, is presented in Appendix B. 


1.15  Gate-Level versus Transistor-Level Analysis 


As seen in Figure 1.6, digital circuits can be represented at several levels, starting from the transistor- 
level all the way up to the system-level. Books on digital design usually start at gate-level (though some 
might include a few trivial transistor-level implementations) with all sorts of circuits constructed using 
only gates (AND, NAND, NOR, etc.). As an example, Figure 1.18 shows a common implementation for 
a D latch, which uses NAND gates and an inverter. 

Even though this type of representation is indispensable to more easily describe the circuit function- 
alities, analysis of internal details (at transistor-level) allows the readers to gain a solid understanding of 
a circuit’s real potentials and limitations and to develop a realistic perspective on the practical design of 
actual integrated circuits. 

To illustrate the importance of including transistor-level analysis (at least for the fundamental cir- 
cuits), Figure 1.19 shows some examples of how a D latch is actually constructed (this will be studied in 
Chapter 13). As can be seen, Figures 1.18 and 1.19 have nothing in common. 

In summary, although large circuits are depicted using gate-level symbols, the knowledge of how the 
fundamental gates and register are actually constructed is necessary to develop a solid understanding 
of the circuit function. 


clk 


FIGURE 1.18. Gate-level D latch implementation. 
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FIGURE 1.19. Examples of actual (transistor-level) D latch implementations (Chapter 13). 


Binary Representations 


Objective: This chapter shows how bits can be used to represent numbers and characters. The codes 
presented for integers are sequential binary, octal, hexadecimal, Gray, and BCD. The codes for negative 
integers are sign-magnitude, one’s complement, and two's complement. The codes for real numbers are single- 
and double-precision floating-point. And finally, the codes for characters are ASCII and Unicode. 
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2.7 ASCII Code 

2.8 Unicode 
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2.1 Binary Code 


When we press a number, such as 5, in a calculator’s keypad, two things happen: On one hand, the 
number is sent to the display so the user can be assured that the right key was pressed; on the other 
hand, the number is sent to the circuit responsible for the calculations. However, we saw in Section 1.3 
that only two-valued (binary) symbols are allowed in digital circuits. So how is the number 5 actually 
represented? 

The most common way of representing decimal numbers is with the sequential binary code, also referred 
to as positional code, regular binary code, or simply binary code (one must be careful with this type of desig- 
nation because all codes that employ only two-valued symbols are indeed binary). This is what happens 
to the number 5 mentioned above. Even though a different code is normally used to represent the keys 
in the keypad (for example, in the case of computers with a PS/2 keyboard a code called Scan Code 
Set 2 is employed), the vector that actually enters the processor (to perform a sum, for example) normally 
employs sequential binary encoding. 

This type of encoding was introduced in Section 1.3 and consists of using a bit vector where each bit 
has a different weight, given by 2"!, where k is the bit’s position in the binary word from right to left 
(Figure 2.1(a)). Consequently, if 8 bits (one byte) are used to represent the number 5, then its equivalent 
binary value is "00000101" because 0-27+0-2°+0-2°+0-2*+0-2°+1-2?+0-2'+1-2°=5. For obvious 
reasons, the leftmost bit is called MSB (most significant bit), while the rightmost one is called LSB (least 
significant bit). 
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a ®t gg?) nt — ght oN2 2 2 of 2° 
MSB LSB MSB LSB 


(a) (b) 


FIGURE 2.1. (a) Regular one-byte representation for the decimal number 5; (b) General relationship between 
an N-bit binary word and its corresponding decimal value. 


Another example is shown below, where decimals are encoded using 4-bit codewords. 


Decimal Binary 
0 0000 
0001 
0010 
0011 
0100 
0101 


aya) wl]rn] = 


14 1110 
15 1111 


The relationship between decimal and binary numbers can then be summarized by the following 
equation: 


N-1 
yd a2‘ (2.1) 
=0 


where y is a decimal number and a=4ay_1,_ 4,4) is its corresponding regular binary representation (Figure 
2.1(b)). 

Sie additional examples are given below (note that again, whenever appropriate, VHDL syntax is 
employed, that is, a pair of single quotes for single bits and a pair of double quotes for bit vectors). 

"1100"=1-8+1-4+0-2+0-1=12 

"10001000"=1-128+0-64+0-32+0-164+1-8+0-4+0-2+0-1=136 

"11111111" =1-128+1-644+1-324+1-1641-84+1-44+1-24+1-1=255 


It is easy to verify that the range of unsigned (that is, nonnegative) decimals that can be represented 
with N bits is: 


O=y=2"-1 (2.2) 
Some examples showing the largest positive integer (1max) as a function of N are presented below. 


N=8 bits > max=2°-1=255 
N=16 bits > max=2!°-1=65,535 
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FIGURE 2.2. Flowchart for the successive-approximation algorithm. 


N=32 bits > max=2°-1=4,294,967,295 

N=64 bits > max=2°-1~1.8-10" 

Equation 2.1 allows the conversion of a binary number into a decimal number. To do the opposite, that 
is, to convert a decimal number y into an N-bit binary string a=ay_1... 4, the successive-approximation 
algorithm can be used (depicted in Figure 2.2). It consists of the following three steps. 

Step 1: If y=2\~!, then ay_,='l' and subtract 2‘! from y. Else, ay_,='0' and y remains the same. 

Step 2: Decrement N. 

Step 3: If N=0, then done. Else, return to step 1. 


MM EXAMPLE 2.1 DECIMAL-TO-BINARY CONVERSION 


Convert into binary, with N=5 bits, the decimal number 26. 


SOLUTION 


1 iteration: 26>2+, so a,='1'; new y and N are y=26-16=10 and N=5-1=4. 
2°4 iteration: 10>2°, so a;='1'; new y and N are y=10-8=2 and N=4-1=3. 
3° iteration: 2 <2”, so a)='0'; new y and N are y=2 and N=3-1=2. 

4" iteration: 2=21, so a,='1'; new y and N are y=2-2=0 and N=2-1=1. 

5" iteration: 0<2°, so ag='0'; new y and N are y=0 and N=1-—1=0 (done). 
Therefore, @=4403470,4)="11010". 
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2.2 Octal and Hexadecimal Codes 


Octal is a code whose main usage is to more compactly represent binary vectors. It consists simply of 
breaking the binary word in groups of 3 bits, from right to left, and then encode every group using the 
regular binary code described above. Some examples follow where the subscript 8 is used to indicate its 
octal base (to avoid confusion with decimal values, for which the base, ten, is omitted by default). For 
example: 


"11110000" ="11 110 000" =360, 

"000011000111" ="000 011 000 111"=0307, 

The same type of purpose and the same procedure occurs in the hexadecimal code, which is much 
more popular than octal. In it, groups of 4 bits are encoded instead of 3, hence with base 16. Because 4-bit 
numbers range from 0 to 15, the characters A through F are employed to represent numbers above 9, that 
is, 1O=A, 11=B, 12=C, 13=D, 14=E, 15=F For example: 

"11110000" ="1111 0000" = FO;¢ 

"1100011001111" ="1 1000 1100 1111" =18CF,, 


The procedure above allows the conversion from binary to hexadecimal and vice versa. To convert 
from hexadecimal directly to decimal, the equation below can be used, where h=hjj_,.../,ho is an 
M-digit hexadecimal number, and y is its corresponding decimal value: 


M-1 
y=>, hat (2.3) 
k=0 


MM EXAMPLE 2.2 HEXADECIMAL-TO-DECIMAL CONVERSION 
Convert F012A;¢ to decimal. 


SOLUTION 


Using Equation (2.3), we obtain: 
yaha* + hy2**! + p24 *? + hy2**3 + hy 4 =A -24*9 42-2414 -29%74.0-29°3 4 F-24%4= 983,338. 


2.3 Gray Code 


Another popular code is the Gray code (common in mechanical applications). It is a UDC (unit-distance 
code) because any two adjacent codewords differ by just one bit. Moreover, it is an MSB reflected code 
because the codewords are reflected with respect to the central words and differ only in the MSB 
position. 

To construct this code, we start with zero and then simply flip the rightmost bit that produces a new 
codeword. Two examples are given below, where 2- and 3-bit Gray codes are constructed. 


2-bit Gray code: "00" > "01" > "11" > "10" 
3-bit Gray code: "000" — "001" —> "011" > "010" > "110" > "111" > "101" > "100" 
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2.4 BCD Code 


In the BCD (binary-coded decimal) code, each digit of a decimal number is represented separately by a 
4-bit regular binary code. Some examples are shown below. 


90 — "1001" "0000" 
255 — "0010" "0101" "0101" 
2007 — "0010" "0000" "0000" "0111" 


MM EXAMPLE 2.3 NUMBER SYSTEMS #1 


Write a table with the regular binary code, octal code, hexadecimal code, Gray code, and BCD code 
for the decimals 0 to 15. 


SOLUTION 


The solution is presented in Figure 2.3. 


number 
| _0100__—| 
| 0101 
0110 
0111 
1000 
10 0001 0000 
1110 0001 0001 
1010 0001 0010 
15 


1 
2 
3 
6 
7 
8 


FIGURE 2.3. Codes representing decimal numbers from 0 to 15 (Example 2.3). 


EXAMPLE 2.4 NUMBER SYSTEMS #2 

Given the decimal numbers 0, 9, 99, and 999, determine: 

a. The minimum number of bits needed to represent them. 
b. Their regular binary representation. 

c. Their hexadecimal representation. 


d. Their BCD representation. 
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SOLUTION 
The solution is shown in the table below. 
Decimal # of bits Binary code Hexa code BCD code 
0 1 0 0 0000 
9 4 1001 9 1001 
99 i 1100011 63 1001 1001 
999 10 1111100111 3E7 1001 1001 1001 a 


2.5 Codes for Negative Numbers 


There are several codes for representing negative numbers. The best known are sign-magnitude, one’s 
complement, and two's complement. However, the hardware required to perform arithmetic operations 
with these numbers is simpler when using two’s complement, so in practice this is basically the only one 
used for integers. 


2.5.1 Sign-Magnitude Code 


In this case, the MSB represents the sign ('0'=plus, 'l'=minus). Consequently, it does not take part in 
the sequential (weighted) binary encoding of the number, and two representations result for 0. Some 
examples are given below. 


"0000" = +0 
"1000" =—0 
"00111" =+7 
"10111" =-7 


"01000001" = +65 
"11000001" =-65 


The range of decimals covered by an N-bit signed-magnitude code is given below, where x is an integer: 


aQuateaye2 4 (2.4) 


2.5.2 One’s Complement Code 


If the MSB is '0', then the number is positive. Its negative counterpart is obtained by simply comple- 
menting (reversing) all bits (again, a '1' results in the MSB position when the number is negative). Like 
sign-magnitude, two representations result for 0, called +0 ("00...0") and —-0 ("11...1"). Some examples 
are shown below. 

"0000" = +0 (regular binary code) 

"1111" =-0 (because its complement is "0000" = 0) 

"0111"=+7 (regular binary code) 
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"1000" =-7 (because its complement is "0111" =7) 
"01000001" = +65 (regular binary code) 
"10111110" =-65 (because its complement is "01000001" = 65) 


Formally speaking, this code can be represented in several ways. Say, for example, that a is a positive 
number, and we want to find its reciprocal, b. Then the following is true: 


For aand bin binary form: b= a’ (2.5) 
For aand bin signed decimal form: b=—a (2.6) 
For aand bin unsigned decimal form: b=2"—1-a (2.7) 


Equation 2.5 is the definition of one’s complement in the binary domain, while Equation 2.6 is the 
definition of negation for signed numbers (thus equivalent to one’s complement in this case) in the 
decimal domain. Equation 2.7, on the other hand, determines the value of b as if it were unsigned (that 
is, as if the numbers ranged from 0 to 2N-1). 

To check Equation 2.7, let us take a=7. Then b= 2*~1-a=8, which in unsigned form is represented as 
8="1000", indeed coinciding with the representation of —7 just seen above. 

The range of decimals covered by an N-bit one’s complement-based code is also given by Equation 2.4. 


2.5.3 Binary Addition 


To explain the next code for negative numbers, called two's complement, knowledge of binary addition 
is needed. Because binary arithmetic functions will only be seen in the next chapter, an introduction to 
binary addition is here presented. 

Binary addition is illustrated in Figure 2.4, which shows the simplest possible case, consisting of 
two single-bit inputs. The corresponding truth table is presented in (a), where a and D are the bits to be 
added, sum is the result, and carry is the carry-out bit. Analogously to the case of decimal numbers, in 
which addition is a modulo-10 operation, in binary systems it is a modulo-2 operation. Therefore, when 
the result reaches 2 (last line of the truth table), it is diminished of 2, and a carry-out occurs. From (a) we 
conclude that sum and carry can be computed by an XOR and an AND gate, respectively, shown in (b). 

The general case (three inputs) in depicted in Figure 2.5, in which a carry-in bit (cin) is also includ- 
ed. In (a), the traditional addition assembly is shown, with a=4a,4,4,4) and b=b,b,b,by representing 
two 4-bit numbers to be added, producing a 5-bit sum vector, suim=s4538y5,59, and a 4-bit carry vector, 
CATTY =C4C3CyC,. The algorithm is summarized in the truth table shown in (b) (recall that it is a modulo-2 
operation). In (c), an example is presented in which "1101" (=13) is added to "0111" (=7), producing 
"10100" (=20) at the output. The carry bits produced during the additions are also shown. 

We can now proceed and introduce the two’s complement code for negative numbers. 


Len] | | 
}7O}] @ | O “4 
(a) [44 20 1 (b) 5 ome 


FIGURE 2.4. Two-input binary addition: (a) Truth table; (b) Sum and carry computations. 
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carry-in inputs sum carry-out 
cin ab cintatb | cout 
0 00 0 0 
is | 
CEs OS (carry) 5 rar - 0 1 
a3 a2 a, ao 0 10 1 0 — 
+ Peete a aa eee ; Ota 
Ss S3 S2 $1 So (Sum) L 0.0 1 0 10 10°C 
1 0 1 230 1 
1 10 230 1 
(a) (b) [4 14 351 1 (c) 


FIGURE 2.5. Three-input binary addition: (a) Traditional addition assembly; (b) Truth table; (c) Addition 
example. 


2.5.4 Two’s Complement Code 


Due to the simplicity of the required hardware (described in Chapter 12), this is the option for 
representing negative numbers adopted in practically all computers and other digital systems. In 
it, the binary representation of a negative number is obtained by taking its positive representation 
and complementing (reversing) all bits then adding one to it. For example, to obtain the 5-bit rep- 
resentation of —7 we start with +7 ("00111"), then flip all bit values (>"11000") and add '1' to the 
result (>"11001"). 


MM EXAMPLE 2.5 TWO'S COMPLEMENT 


Using 8-bit numbers, find the two’s complement representation for the following decimals: —1, —-4, 
and -128. 


SOLUTION 


For —1: Start with +1 ("00000001"), complement it ("11111110"), then add 1 ("11111111"). Observe that 
signed —1 corresponds to unsigned 255. 


For —4: Start with +4 ("00000100"), complement it ("11111011"), then add 1 ("11111100"). Note that signed 
—4 corresponds to unsigned 252. 


For -128: Start with +128 ("10000000"), complement it ("01111111"), then add 1 ("10000000"). Note that 
signed -128 corresponds to unsigned 128. 


An interesting conclusion can be drawn from the example above: The sum of the magnitude of the 
signed number with its unsigned value always adds to 2, where N is the number of bits. Hence, a set of 
equations similar to (2.5)-(2.7) can be written for two’s complement systems. 


For aand bin binary form: b= a’ +1 (2.8) 
For aand b in signed decimal form: b=—a (2.9) 


For aand bin unsigned decimal form: b=2"—a (2.10) 
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_pN-1 oN-2 2 2 oo 9? 
eee apeiron | eel 
SB LSB 


M 


FIGURE 2.6. Relationship between two’s complement and decimal representations. 


The relationship between two’s complement and signed decimal representations is further illustrated 
in Figure 2.6, which shows that indeed only a minus sign must be appended to the MSB. Therefore, 
another way of converting a two’s complement number to a decimal signed number is the following: 


N-2 


x=-ay_,2"'4>, a2* (2.11) 
k=0 


For example, say that a="100010". Then the corresponding signed decimal value obtained with 
Equation 2.11 is x=—a52° +442* +432? + a2? + a,2' + ag2°=-32+2=-30. 


MM EXAMPLE 2.6 SIGNED AND UNSIGNED DECIMALS 


Given a 3-bit binary code, write the corresponding unsigned and signed decimals that it can represent. 
For the signed part, consider that two’s complement has been employed. 


SOLUTION 


The solution is shown in Figure 2.7. 


Binary Unsigned Signed 
word decimal decimal 
000 0 0 
001 1 1 
010 2 2 
011 3 3 
100 4 —4 
101 5 3 
110 6 2 
111 7 4 

FIGURE 2.7. Solution of Example 2.6. Oo 


The range of decimals covered by an N-bit two’s complement-based code is given below, where x is 
an integer. Note that this range is asymmetrical and is larger than that in Equation 2.4 because now there 
is only one representation for zero. 


a aye)" | (2.12) 


As with any other binary representation, an N-bit two’s complement number can be extended (more 
bits) or truncated (less bits). To extend it from N to M (> N) bits, the sign bit must be repeated M-N times 
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on the left (this operation is called sign extension). To truncate it from N down to M (< N) bits, the N-M 
leftmost bits must be removed; however, the result will only be valid if the N-M+1 leftmost bits are 
equal. These procedures are illustrated in the example below. 


MM EXAMPLE 2.7 TWO’'S COMPLEMENT EXTENSION AND TRUNCATION 


Say that a="00111" (=7) and b="11100"(=-4) are binary values belonging to a two’s complement- 
based signed system. Perform the operations below and verify the correctness of the results. 


a. 5-to-7 extension of a and b. 
b. 5-to-4 truncation of a and b. 


c. 5-to-3 truncation of a and b. 


SOLUTION 


Part (a): 

5-to-7 extension of a: "00111" (=7) — "0000111" (=7) 
5-to-7 extension of b: "11100" (=-4) — "1111100" (=-4) 
Sign extensions always produce valid results. 


Part (b): 
5-to-4 truncation of a: "00111" (=7) — "0111" (=7) Correct. 
5-to-4 truncation of b: "11100" (=-4) — "1100" (=-4) Correct. 


Part (c): 
5-to-3 truncation of a: "00111" (=7) — "111" (=-1) Incorrect. 
5-to-3 truncation of b:"11100" (=-4) — "100" (=-4) Still correct. 


As mentioned earlier, because of the simpler hardware, the two’s complement option for representing 
negative numbers is basically the only one used in practice. Its usage will be illustrated in detail in the 
next chapter, when studying arithmetic functions, and in Chapter 12, when studying physical imple- 
mentations of arithmetic circuits (adders, subtracters, multipliers, etc.). 


2.6 Floating-Point Representation 


Previously we described codes for representing unsigned and signed integers. However, in many 
applications it is necessary to deal with real-valued numbers. To represent them, the IEEE 754 standard 
is normally employed, which includes two options that are shown in Figure 2.8. 


2.6.1 IEEE 754 Standard 


The option in Figure 2.8(a) is called single-precision floating-point. It has a total of 32 bits, with one bit devoted 
to the sign (S), 8 bits to the exponent (E), and 23 bits devoted to the fraction (F). It is assumed to be represented 
using normalized scientific notation, that is, with exactly one nonzero digit before the binary point. 

The corresponding decimal value (y) is determined by the expression below, where E is the biased 
exponent, while e is the actual exponent (e=E-127). 
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BF OP OP oho iDe Ok ge ON oe OR oe 36 2 
(a) |S Exponent (E) Fraction (F) 
1 bit 8 bits 23 bits 
Bere. OP DU Oe Oe ae aoe Brio Oe Gore are Oe 7 7 gee Fats 
1 bit 11 bits 52 bits 


FIGURE 2.8. (a) Single- and (b) double-precision floating-point representations (IEEE 754 standard). 


Single precision: 
y=(Ay(1 + F277 (2.13) 
Where, for normalized numbers: 
1=F=254or-126=e=127 (2.14) 


The option in Figure 2.8(b) is called double-precision floating-point. It has a total of 64 bits, with 1 bit 
for the sign, 11 bits for the exponent, and 52 bits for the fraction. The corresponding decimal value (y) is 
determined by the expression below, where again E is the biased exponent, while e is the actual exponent 
(now given by e=E-1023). 

Double precision: 

y= Cty + FZ (2.15) 

Where, for normalized numbers: 

1 =F=2046 or -1022 =e=1023 (2.16) 


The exponent is biased because the actual exponent must be signed to be able to represent very small 
numbers (thus with a negative exponent) as well as very large numbers (with a positive exponent). Because 
the actual binary representation of E is unsigned, a bias is included, so e can indeed be positive or negative. 
The allowed ranges for E and e (for normalized numbers) were shown in Equations 2.14 and 2.16. 

Note that, contrary to the representation for negative integers, which normally employs two’s comple- 
ment, a negative value is represented in a way similar to the sign-magnitude option described in the previ- 
ous section, that is, a negative number has exactly the same bits as its positive counterpart, with the only 
difference in the sign bit. 

The (1+ F) term that appears in the equations above is the significand. Note that the '1' in this term is 
not included in the actual binary vector because it is assumed that the number is stored using normalized 
scientific notation, that is, with exactly one nonzero digit before the binary point. Therefore, because the 
only nonzero element in binary systems is 1, there is no need to store it. Consequently, the significand’s 
actual resolution is 24 bits in single precision and 53 bits in double precision. 

One limitation of floating-point is that the equations do not produce the value zero, so a special repre- 
sentation must be reserved for it, which consists of filling the whole E and F fields with zeros. This case 
is shown in the table of Figure 2.9 (first line). Note that there are two mathematically equivalent zeros, 
called +0 and -0, depending on the value of S. 

The second line in Figure 2.9 shows the representation for infinity, which consists of filling E with '1's and 
F with '0's, with the sign determined by S. The third line shows a representation that does not correspond 
to a number (indicated by NaN = Not a Number), which occurs when E is maximum (filled with '1's) and 
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Sign (S) Exponent (E) Fraction (F) Value (y) 
0/1 0 +0/-0 
0/1 +0 | —x 
0/1 NaN 
0/1 Denormalized 
0/1 Normalized 

max=255 for single-precision or 2047 for double-precision 

| NaN= Not a number 


FIGURE 2.9. Set of possible representations with the IEEE 754 floating-point standard. 


F #0; this case is useful for representing invalid or indeterminate operations (like 0 + 0, ~—~, etc.). The 
fourth line shows the representation for denormalized numbers, which occurs when E =O and F # 0. Finally, 
the fifth line shows the regular representation for normalized numbers, whose only condition is to have 
0<E<max, where max =255 for single precision and max =2047 for double precision (that is, E="11...1"). 
The numeric values that can be represented using floating-point notation is then as follows: 
Single precision: 
y=, 2128 <ys—7 1% y= +0, 4.27126 <y< 42128 y=t+0 (2.17) 
Double precision: 
y=-%, ate ye 72 y= +0, 4.271022 <y< 71024 y=to (2.18) 


MM EXAMPLE 2.8 FLOATING-POINT REPRESENTATION #1 


Determine the decimal values corresponding to the binary single-precision floating-point vectors 
shown in Figure 2.10. (Note that to make the representations cleaner, in all floating-point exercises 
we drop the use of quotes for bits and bit vectors.) 


2? 2 os 34 3 ye 2! At 2 1 2 2 > 3 2 4 2 22 9% 
(a) [ROSE aos oy dias, 0 0 
2? 2° rig 24 os 7 2! oo 2 1 2 2 2 3 2 4 2 22 2 23 


(b) [o[1 00 00001/010 0... 0 0 


FIGURE 2.10. Floating-point representations for Example 2.8. 


SOLUTION 
a. S=1,E=127, and F=0.5. Therefore, y=(-1)!(1+0.5)277-1?7=-1.5. 
b. S=0, E=129, and F=0.25. Therefore, y=(-1)°(1 +0.25)2!?-177=5. 
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EXAMPLE 2.9 FLOATING-POINT REPRESENTATION #2 
Determine the binary single-precision floating-point representations for the following decimals: 


a. 0.75 
b. —2.625 
ec. 37/32 


SOLUTION 
1,1_3 9-2 9-2 9-1 
a. 0.75=$4+5=9=3-27=11,27=11,-2 
Therefore, S=0, F=1000...0, and E=01111110 (because E=e+127=-1+127=126). 
Note: The subscript ‘2’ used above to distinguish a binary value from a decimal (default) value 
will be omitted in all representations that follow. 
b. -2625=-(2+35+4)=-2=-21 -2°3=-1010-12-3=-1.0101 - 2! 
Therefore, S=1, F=010100...0, and E=10000000 (because E=e+ 127=1+4+127=128). 
c. f= 37 -2°=100101 -2°=1.00101 - 2° 


Therefore, S=0, F=0010100...0, and E=01111111 (because E=e+127=0+127=127). M 


2.6.2 Floating-Point versus Integer 


Floating-point has a fundamental feature that integers do not have: the ability to represent very large 
as well as very small numbers. For example, when using single-precision floating point we saw that the 
range is +2-6 to near +2!78 which is much wider than that covered with 32-bit integers, that is, -23! to 
2°!_1. However, there has to be a price to pay for that, and that is precision. 

To illustrate it, the example below shows two 32-bit representations: the first (y,) using integer 
format and the second (y3) using floating-point format. Because the fraction in the latter can only have 
23 bits, truncation is needed, suppressing the last 8 bits of y, to construct y. In this example, the case of 
minimum error (within the integer range) is illustrated, in which only the last bit in the last 8-bit string 
of y, is ‘1’. 

Representation of a 32-bit integer: 


y, = 11111111 11111111 11111111 00000001 
=1.1111111 11111111 11111111 00000001 - 23" 


Corresponding single-precision floating-point representation (F has 23 bits): 
y= so) LITT TI +2" (S=0) F=111 .., B= 158) 

Consequently, the following error results: 

(¥;-¥>) /y, = (00000000 00000000 00000000 00000001) /y,=1/y, +2 


The differences between integer and floating-point representations are further illustrated in the hypo- 
thetical floating-point system described in the example below. 
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MM EXAMPLE 2.10 HYPOTHETICAL FLOATING-POINT SYSTEM 


Consider the 6-bit floating-point (FP) representation shown in Figure 2.11(a), which assigns 1 bit for 
the sign, 3 bits for the exponent, and 2 bits for the fraction, having the exponent biased by 1 (that is, 
e=E-1; see the equation in Figure 2.11(a)). 


1 bit 3 bits 2 bits 


1 


y=(-1P+F 2 


FIGURE 2.11. Hypothetical 6-bit floating-point system for Example 2.10. 


a. List all values that this system can produce. 
b. How many values are there? Is this quantity different from that in a 6-bit integer representation? 
c. Which system can represent smaller and larger numbers? Comment on the respective resolutions. 


SOLUTION 


a. Using the expression given in Figure 2.11(a), the values listed in Figure 2.11(b) are obtained, 
which range from +0.5 to +112. 


b. There are 64 values, which is the same quantity as for a 6-bit integer system, that is, 2°=64. This 
was expected because that is the total amount of information that 6 bits can convey. 


c. This is the most interesting part because it makes the differences between FP and integer clear. 
For 6-bit signed integers, the values are —32, -31,..., -1, 0, 1,..., 30, 31, which are uniformly 
distributed. FP, on the other hand, produces more concentrated values (better resolution) around 
zero and spreader values (poorer resolution) toward the range ends. The extreme values in this 
example are +32 for integer and +112 for FP, so FP exhibits a wider range. Moreover, small val- 
ues, like 0.5, cannot be represented with integers but can with FP. On the other hand, for large 
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numbers the opposite happens, where values like 9, 11, 13, etc., can be represented with integers 
but cannot with FP. Putting it all together: There is no magic; these are simply different represen- 
tation systems whose choice is dictated by the application. 


The use of floating-point representation in computers will be seen in the next chapter when studying 
arithmetic functions. Truncation and rounding will also be described there. 


2.7 ASCII Code 


We previously saw several binary codes that can be used to represent decimal numbers (both integer and 
real-valued). There are also several codes for representing characters, that is, letters, numbers, punctua- 
tion marks, and other special symbols used in a writing system. The two main codes in this category are 
ASCII and Unicode. 


2.7.1 ASCII Code 


The ASCII (American Standard Code for Information Interchange) code was introduced in the 1960s. 
It contains 128 7-bit codewords (therefore represented by decimals from 0 to 127) that are shown in 
Figure 2.12. This set of characters is also known as Basic Latin. 

The first two columns (decimals 0-31) are indeed for control only, which, along with DEL (delete, 
decimal 127), total 33 nonprintable symbols. SP (decimal 32) is the space between words. For example, 
to encode the word “Go” with this code, the following bit string would be produced: "1000111 1101111". 


a a_i 
[bebebibo| 000 | oo1 | o10 | of | 100 | 107 | 110 | 11 | 
| ooo | Nur | oe | sp | o | @ | ep | * | Pp | 
Oats S) O OR S| We = = = ls ee | 
ak Sai sae se Ve Sa Sa 
RE eee eee ee ee ee ee 
Fa ae ae a ee se 
| o1o1 | ENQ | NAK | % | 5 | © | ui | e | uw | 
pono _f ack | syw {a fe { fo jovi foe fiw 


FIGURE 2.12. ASCII code. 
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2.7.2 Extended ASCII Code 


Extended ASCII is an 8-bit character code that includes the standard ASCII code in its first 128 positions 
along with 128 additional characters. 

The additional 128 codewords allow the inclusion of symbols needed in languages other than English, 
like the accented characters of French, Spanish, and Portuguese. However, these additional characters 
are not standardized, so a document created using a certain language might look strange when opened 
using a word processor in a country with a different language. This type of limitation was solved with 
Unicode. 


2.8 Unicode 


Unicode was proposed in 1993 with the intention of attaining a real worldwide standard code for characters. 
Its current version (5.0, released in 2006) contains ~99,000 printable characters, covering almost all writing 
systems on the planet. 


2.8.1 Unicode Characters 


Each Unicode point is represented by a unique decimal number, so contrary to Extended ASCII (whose 
upper set is not standardized), with Unicode the appearance of a document will always be the same 
regardless of the software used to create or read it (given that it supports Unicode, of course). Moreover, 
its first 128 characters are exactly those of ASCII, so compatibility is maintained. 

Unicode points are identified using the notation U,xx...x, where xx...x is either a decimal or hexa- 
decimal number. For example, U,,0 (or U_,0000,¢) identifies the very first code point, while U_,65,535 (or 
U,FFFF,,) identifies the 65,536" point. 

A total of 1,114,112 points (indexed from 0 to 1,114,111) are reserved for this code, which would in principle 
require 21 bits for complete representation. However, Unicode points are represented using multiples of one 
byte, so depending on the encoding scheme (described later), each point is represented by 1, 2, 3, or 4 bytes. 

The current list of Unicode characters takes the range from 0 to a little over 100,000, within which 
there is a small subrange with 2048 values (from 55,296 to 57,343) that are reserved for surrogate pairs 
(explained below), so they cannot be used for characters. Besides the surrogate range, Unicode has 
several other (above 100k) reserved ranges for control, formatting, etc. 

Two samples from the Unicode table are shown in Figure 2.13. The one on the left is from the very 
beginning of the code (note that the first character’s address is zero), which is the beginning of the ASCII 
code. The second sample shows ancient Greek numbers, whose corresponding decimals start at 65,856 
(hexadecimal numbers are used in the table, so 10,140,,=1-2'°+0-2!7+1-2°+4-24+0-2°=65,856,9). 

The first subset of Unicode to gain popularity was a 652-character subset called Windows Glyph List 4, 
which covers most European languages (with the ASCII code obviously included). This subset has been 
supported by Windows and several other software programs since the mid-1990s. 

Unicode has three standardized encoding schemes called UTF-8 (Unicode transformation format 8), 
UTF-16, and UTF-32, all summarized in Figure 2.14. 


2.8.2 UTF-8 Encoding 


The UTF-8 encoding scheme uses variable-length codewords with 1, 2, 3, or 4 bytes. As shown in the 
upper table of Figure 2.14, it employs only 1 byte when encoding the first 128 symbols (that is, the ASCII 
symbols), then 2 bytes for characters between 128 and 2047, and so on. 
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Controls and Basic Latin Ancient Greek Numbers 


002 003 004 005 1014 1015 1016 1017 1018 


10100 10170 ae 
yor 


wo172 ae 
10173 nT 
10174 ap 


FIGURE 2.13. Two samples from Unicode. The one on the left is from the very beginning of the code table, 
which is the ASCII code. The second sample shows ancient Greek numbers (note that the corresponding deci- 
mals are in the 65,000 range—the numbers in the table are in hexadecimal format). 


aes Unicode encoding 


<a Byte 3 
0000 0000 Oaaa aaa 
128 to 2047 0000 Obbb bbaa aaaa | 110b ——— 


2047 to 55,295 and cccc bbbb bbaa aaaa 1110 cccc | 10bb bbbb | 10aa aaaa 
57,344 to 65,535 
65,536 to 1M 000d dddd cccc bbbb 1111 Oddd | 10dd cccc | 10bb bbbb | 10aa aaaa 
(Note 2) bbaa aaaa 


UTF-16 Unicode encoding 
Unicode point Byte2 | Byte 3 


0 to 55,295 and aaaa aaaa aaaa aaaa aaaa aaaa | aaaa aaaa 
57,344 to 65,535 
65,536 to 1M 000b bbbb aaaa aaaa 1101 10cc | ccaa aaaa | 1101 11aa | aaaa aaaa 
aaaa aaaa (Note 3) 


UTF-32 Unicode encoding 
Unicode point Byte 2 Byte 3 


0 to 55,295 and 000a aaaa aaaa aaaa 0000 0000 | 000a aaaa | aaaa aaaa | aaaa aaaa 
57,344 to 1M aaaa aaaa 


Note 1: a, b, c, and d are single bits. 
Note 2: 1M=1,114,111. 
Note 3: c=b-—1 truncated on the left to 4 bits. 


FIGURE 2.14. Standard Unicode encoding schemes (UTF-8, UTF-16, UTF-32). 
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Note that the surrogate subrange mentioned above is excluded. Note also that the encoding is not 
continuous (there are jumps in the middle) so that a single byte always starts with '0', while multiple 
bytes always start with "10", except byte 1, which starts with "11". This causes a UTF-8 bit stream to be 
less affected by errors than the other encodings described below (for example, if an error occurs dur- 
ing data transmission or storage, the system resynchronizes at the next correct character). Moreover, in 
traditional languages, UTF-8 tends to produce shorter files because most characters are from the ASCII 
code. On the other hand, the length of individual characters is highly unpredictable. 


MM EXAMPLE 2.11 UTF-8 UNICODE ENCODING 


a. Determine the decimal number that represents the Unicode point whose binary representation is 
"0001 1000 0000 0001". 


b. Determine the binary UTF-8 encoding string for the Unicode point above. 


SOLUTION 
a. U,=217+2"4+1=6145. 


b. This point (6145) is in the third line of the corresponding table from Figure 2.14. Therefore, 
eccc="0001", bbbbbb ="100000", and. aaaaaa="000001". Consequently, the following UTF-8 string 
results: "1110 0001 1010 0000 1000 0001". Ml 


Note: In most examples throughout the text the subscript that specifies the base is omitted because 
its identification in general is obvious from the numbers. However, the base will always be explicitly 
informed when dealing with hexadecimal numbers because they are more prone to confusion (either the 
subscript “16” or an “h” following the number will be employed). 


2.8.3 UTF-16 Encoding 


UTF-16 encoding also employs variable-length codewords but now with 2 or 4 bytes (Figure 2.14). 
This codeis derived from the extinct 16-bit fixed-length encoding scheme, which had only 2!° = 65,536 code- 
words. This part of the code (from 0 to 65,535) is called base multilingual plane (BMP). When a Unicode 
point above this range is needed, a surrogate pair is used, which is a pair taken from the reserved surro- 
gate range mentioned earlier. In other words, Unicode points higher than 65,535 are encoded with two 
16-bit words. In most languages, such characters rarely occur, so the average length is near 16 bits per 
character (longer than UTF-8 and more subject to error propagation). 

The first 16-bit word (bytes 1-2) in the surrogate pair is chosen from the first half of the surrogate 
range (that is, 55,296-56,319 or D800,,-DBFF,,), while the second 16-bit word (bytes 3-4) is chosen from 
the second half of the surrogate range (56,320-57,343 or DC00,,-DFFF,,). Consequently, because there 
are 1024 words in each half, a total of 10247=1,048,576 codewords result, which, added to the 65,536 
codewords already covered with only two bytes, encompasses the whole range of decimals devoted to 
Unicode, that is, from 0 to 1,114,111 (0000,,-10FFF,,). 

From the description above, it is easy to verify that the relationship between the Unicode decimal 
value (U,,) and the decimals that represent the two words in the surrogate pair (P, for bytes 1-2, P, for 
bytes 3-4) is the following: 


U, = 1024(P,—55,287) +P, (2.19) 
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Or, reciprocally: 
P,=55,296+q (2.20) 
P, = U,,—1024(P, - 55,287) (2.21) 
Where q is the following quotient: 
qz=|(U,—65,536)/1024| (2.22) 


MM EXAMPLE 2.12 UTF-16 UNICODE ENCODING 


a. Determine the decimal corresponding to the Unicode point whose binary representation is 
"0001 0000 0000 0000 0000 0000". 


b. Determine the binary UTF-16 encoding string for the Unicode point above using Equations 2.19 
to 2.22. 


c. Repeat part (b) above using the table in Figure 2.14. 


SOLUTION 
a. U,=27=1,048,576. 


b. g=960, P;=56,256 (="1101 1011 1100 0000"), and P,=56,320 (="1101 1100 0000 0000"). Therefore, 
P,P,="1101 1011 1100 0000 1101 1100 0000 0000" (= DB CO DC 00). 


c. From U,="0001 0000 0000 0000 0000 0000" we determine that bbbbb ="10000" and aa...a="00...0". 
Thus c=b-1 (truncated on the left) is c="10000"-1="1111". Consequently, the same string shown 
above results for P|P,. 


2.8.4 UTF-32 Encoding 


As shown in the corresponding table from Figure 2.14, it employs a fixed-length codeword with 32 bits, 
which often facilitates the allocation of resources. However, even though there are few, some of the Uni- 
code graphical symbols result from the combination of two or more characters, so a truly constant length 
is not possible. This encoding, of course, leads to files that are nearly twice the size of UTF-16 files. 


MM EXAMPLE 2.13 UTF-32 UNICODE ENCODING 


a. Determine the decimal number that represents the Unicode point whose binary representation is 
"0001 1000 0000 1110". 


b. Determine the binary UTF-32 encoding string for the Unicode point above. 
SOLUTION 

a. U,=2"4+2"4294+27+2'=6158. 

b. "0000 0000 0000 0000 0001 1000 0000 1110". Hf 
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2.9 Exercises 


1. 


Number of codewords 
Consider an 8-bit system. 
a. How many codewords are there with exactly three '0's and five '1's? 


b. Write a table with the number of codewords as a function of the codeword’s Hamming weight 
(number of '1's in the codeword). For which Hamming weight is this number maximum? 


Binary to decimal conversion #1 

Write the decimal numbers corresponding to the following unsigned binary representations: 
a. "0000 1111" (the space inside the string is just to make it easier to read) 

b. "0000 1111 0010" 

c. "1000 1000 0001 0001" 

Binary to decimal conversion #2 

Write the decimal numbers corresponding to the following unsigned binary representations: 
a. "1000 1001" 

b. "1000 1111 0000" 

c. "0010 1000 0000 0001" 

Binary to hexadecimal conversion #1 

Write the hexadecimal numbers corresponding to the following unsigned binary representations: 
a. "0000 1110" 

b. "00 1111 0010" 

c. "1000 1010 0001 0011" 

Binary to hexadecimal conversion #2 

Write the hexadecimal numbers corresponding to the following unsigned binary representations: 
a. "100 1110" 

b. "0011 1111 0010" 

c. "11111 1010 0001 1001" 

Decimal to binary conversion #1 


a. Determine the minimum number of bits needed to represent the following decimals: 15, 16, 511, 
12,345, and 49,999. 


b. Using the minimum number of bits, write the binary vectors corresponding to the unsigned 
decimals above. 
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7. Decimal to binary conversion #2 


a. Determine the minimum number of bits needed to represent the following decimals: 63, 64, 512, 
2007, and 99,999. 


b. Using the minimum number of bits, write the binary vectors corresponding to the unsigned 
decimals above. 


8. Decimal to hexadecimal conversion #1 


Using four hexadecimal digits, write the hexadecimal numbers corresponding to the following 
decimals: 63, 64, 512, 2007, and 49,999. 


9. Decimal to hexadecimal conversion #2 


Using the minimum possible number of hexadecimal digits, write the hexadecimal numbers corre- 
sponding to the following decimals: 255, 256, 4096, and 12,345. 


10. Hexadecimal to binary conversion #1 


Using N=16 bits, write the binary strings corresponding to the following hexadecimal numbers: 
AA, 99C, OOOF, and. FF7F. 


11. Hexadecimal to binary conversion #2 


Using the minimum possible number of bits, write the binary strings corresponding to the following 
hexadecimal numbers: D, 29C, F000, and 13FF. 


12. Hexadecimal to decimal conversion #1 

Convert the following hexadecimal numbers to decimal: D, 99C, 000F, and 1FF7F. 
13. Hexadecimal to decimal conversion #2 

Convert the following hexadecimal numbers to decimal: AA, 990, 7001, and FFO07. 
14. Octal to decimal conversion 

Convert the following octal numbers to decimal: 3, 77, 0011, and 2222. 
15. Decimal to octal conversion 

Write the octal representation for the following decimals: 3, 77, 111, and 2222. 
16. Decimal to bcd conversion #1 

Write the BCD representation for the following decimals: 3, 77, 001, and 2222. 
17. Decimal to BCD conversion #2 

Write the BCD representation for the following decimals: 03, 65, 900, and 7890. 
18. BCD to decimal conversion 

Convert the following BCD numbers to decimal: 

a. "0101" 

b. "1001" "0111" 

c. "0000" "0110" "0001" 
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19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 
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Gray code #1 

Starting with "11111", construct a 5-bit Gray code. 
Gray code #2 

Starting with "00000", construct a 5-bit Gray code. 
Decimal range #1 


a. Give the maximum decimal range that can be covered in unsigned systems with the following 
number of bits: 6, 12, and 24. 


b. Repeat the exercise for signed systems (with two’s complement). 
Decimal range #2 


a. Give the maximum decimal range that can be covered in unsigned systems with the following 
number of bits: 8, 16, and 32. 


b. Repeat the exercise for signed systems (with two’s complement). 

Decimal to sign-magnitude conversion 

Represent the following signed decimals using 7-bit sign-magnitude encoding: +3, -3, +31,-31, +48, 48. 
Sign-magnitude to decimal conversion 


Give the signed decimals corresponding to the following binary sequence belonging to a system 
where sign-magnitude is used to represent negative numbers: 


a. "00110011" 
b. "10110011" 
c. "11001100" 


Decimal to one’s complement conversion 


Represent the following signed decimals using 7-bit one’s complement encoding: +3, -3, +31, -31, 
+48, -48. 


One’s complement to decimal conversion 


Write the signed decimals corresponding to the following numbers from a signed system that 
employs one’s complement encoding: 


a. "010101" 
b. "101010" 
c. "0000 0001" 
d. "1000 0001" 


Decimal to two’s complement conversion #1 


Represent the following signed decimals using 7-bit two’s complement encoding: +3, -3, +31, -31, 
+48, -48. 
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28. 


29. 


30. 


31. 


32. 


33. 


34. 


Decimal to two’s complement conversion #2 


Represent the following signed decimals using 8-bit two’s complement encoding: +1, -1, +31, -31, 
+64, —64. 


Two’s complement to decimal conversion #1 


Write the signed decimals corresponding to the following numbers from a signed system that 
employs two’s complement encoding: 


a. "010101" 

b. "101010" 

c. "0000 0001" 

d. "1000 0001" 

Two’s complement to decimal conversion #2 

Write the signed decimals corresponding to the following numbers from a signed system that 
employs two’s complement encoding: 

a. "0101" 

b. "1101" 

ce. "0111 1111" 

d. "1111 1111" 

Floating-point representation 

For each result in Example 2.9, make a sketch similar to that in Figure 2.10 showing all bit values. 
Binary to floating-point conversion 

Suppose that y,;=11111111 11111111 11111111 11111111 is a 32-bit integer. Find its single-precision 
floating-point representation, y>. Is there any error in y,? 

Decimal to floating-point conversion #1 

Find the single-precision floating-point representation for the following decimals: 

a. 0.1875 

b. —0.1875 

c 1 

d. 4.75 

Decimal-to-floating-point conversion #2 


Determine the single-precision floating-point representation for the following decimals: 
a. 25 
b. —25 


aoa 
NON 
or Oo 
an a 
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35. Floating-point to decimal conversion #1 


Convert to decimal the FP numbers depicted in Figure E2.35. 


27 28 28 24 93 22 21 29 919293 94 2 223 
(a) [ES ROSRGIO; See tell Om Onn Din tOirrs Onl 
27 28 28 24 93 92 91 9° 91 92 93 94 a 222 9-23 


CO | EA ras Ds Ge 0 0 


FIGURE E2.35. 


36. Floating-point to decimal conversion #2 
Convert to decimal the single-precision FP numbers below. 
a. S=0, F=0011000...0, and E=01111100 
b. S=1, F=1110000...0, and E=10001000 

37. ASCII code #1 


Write the 21-bit sequence corresponding to the 3-character string “Hi!” encoded using the ASCII 
code. 


38. ASCII code #2 


Write the 21-bit sequence corresponding to the 3-character string “MP3” encoded using the ASCII 
code. 


39. ASCII code #3 


What is the sequence of characters represented by the ASCII-encoded sequence "1010110 1001000 
1000100 1001100"? 


40. UTF-8 unicode encoding #1 


Determine the UTF-8 encoding strings for the Unicode points shown in the first column of Figure E2.40 
and prove that the values shown in the second column are correct. 


UTF-32 
00 00 00 A1 
00 00 05 0C 


EF 8C B3 F3 33 00 00 F3 33 
FO 92 8D 8A D8 08 DF 4A 00 01 23 4A 


Note: All values above are hexadecimal. 


Unicode point 
U+ 00A1 
U+ 050C 
U+ F333 
U+ 1234A 


FIGURE E2.40. 
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45. 
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47. 
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UTF-8 unicode encoding #2 


Using UTF-8 encoding, determine the total number of bytes necessary to transmit the following 
sequence of Unicode characters (given in hexadecimal format): U,0031, U,0020, U,1000, LL,0020, 
U.,,020000. 


UTF-8 unicode encoding #3 


Using UTF-8 encoding, determine the bit string that would result from the encoding of each one of 
the following Unicode points (the points are given in hexadecimal format, so give your answers in 
hexadecimal format): U.,002F, U,01FF, U,11 FF, U,1111F. 


UTF-16 unicode encoding #1 


Determine the UTF-16 encoding strings for the Unicode points shown in the first column of Figure E2.40 
and prove that the values shown in the third column are correct. 


UTF-16 unicode encoding #2 


Using UTF-16 encoding, determine the total number of bytes necessary to transmit the following 
sequence of Unicode characters (given in hexadecimal format): U,0031, U,0020, U,1000, LL,0020, 
U_,020000. 


UTF-16 unicode encoding #3 


Using UTF-16 encoding, determine the bit string that would result from the encoding of each one of 
the following Unicode points (the points are given in hexadecimal format, so give your answers in 
hexadecimal format): U.,002F, U,01FF, U,11 FF, U,1111F. 


UTF-32 unicode encoding #1 


Determine the UTF-32 encoding strings for the Unicode points shown in the first column of Figure E2.40 
and prove that the values shown in the fourth column are correct. 


UTF-32 unicode encoding #2 


Using UTF-32 encoding, determine the total number of bytes necessary to transmit the following 
sequence of Unicode characters (given in hexadecimal format): U_,0031, U,0020, U,,1000, U,0020, 
U.,020000. 


UTF-32 unicode encoding #3 


Using UTF-32 encoding, determine the bit string that would result from the encoding of each one of 
the following Unicode points (the points are given in hexadecimal format, so give your answers in 
hexadecimal format): U,002F, U,01FF, U,11 FF, U,1111F. 
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Binary Arithmetic 


Objective: Humans are used to doing arithmetic operations with decimal numbers, while computers 
perform similar arithmetic operations but use the binary system of '0's and '1's. The objective of this 
chapter is to show how the latter occurs. The analysis includes unsigned and signed values, of both integer 
and real-valued types. Because shift operations can also implement certain arithmetic functions, they too 
are included in this chapter. 


Chapter Contents 


3.1 Unsigned Addition 

3.2 Signed Addition and Subtraction 
3.3. Shift Operations 

3.4 Unsigned Multiplication 

3.5 Signed Multiplication 

3.6 Unsigned Division 

3.7 Signed Division 

3.8 Floating-Point Addition and Subtraction 
3.9 Floating-Point Multiplication 
3.10 Floating-Point Division 

3.11 Exercises 


3.1. Unsigned Addition 


Binary addition (also called modulo-2 addition) was introduced in Section 2.5, with Figure 2.5 repeated in 
Figure 3.1 below. The vectors within the gray area are given, while the others must be calculated. a=a340,d9 
and b=b3b,b,b, represent 4-bit numbers to be added, producing a 5-bit sum vector, sum=s4538751S9, and a 
4-bit carry vector, carry =C4C3C2C,. The algorithm is summarized in (b) (recall that it is a modulo-2 operation). 
In (c), an example is given in which "1100" (=12) is added to "0110" (=6), producing "10010" (=18) at the 
output. The carry bits produced during the additions are also shown. 

As shown in the truth table of Figure 3.1(b), the sum bit is '1' when the number of inputs that are high 
is odd, so this is an odd parity function. It can also be seen that the carry bit is '1' when two or more of the 
three input bits are high, so this is a majority function. 

In Figure 3.1(a), the sum has N +1 bits, where N is the number of bits at the input. However, computers 
normally provide only N bits for the sum, sometimes without a carry-out bit, in which case overflow can 
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carry-in inputs sum carry-out 
cin ab cintat+b cout 
0 00 0 0 
00 
Cs Cz Co C} (carry) 0 4 ; 9 s Nae 
a3 a2 a ao 0 a0 1 0 — 
— a ra q Omi 1160 
Ss S3 S2 $1 So (Sum) 1 0.0 L 0 120 0'.4°%0 
1 01 250 1 
1 10 230 1 
(a) (b) [4 11 351 1 (c) 


FIGURE 3.1. Three-input binary addition: (a) Traditional addition assembly; (b) Truth table; (c) Addition example. 
occur (that is, the actual result is greater than that produced by the circuit). All of these aspects are described 
below, where the following two cases are examined: 


Case 1: Unsigned addition with N-bit inputs and N-bit output 
Case 2: Unsigned addition with N-bit inputs and (N+ 1)-bit output 


Case 1 Unsigned addition with N-bit inputs and N-bit output 


When the output has the same number of bits as the inputs, overflow can occur. For example, with 4 bits 
the input range is from 0 to 15, thus the sum can be as big as 30. Consequently, because the output range 
is also from 0 to 15, not all operations will be correct. The overflow check criteria are described below 
and are self-explanatory. 


Overflow check based on carry: If the last carry bit is '1', then overflow has occurred. 


Overflow check based on operands: Overflow occurs when the MSBs of both operands are '1' or when the 
MSBs are different and the sum’s MSB is '0'. 


Ml EXAMPLE 3.1 UNSIGNED ADDITION #1 
For 4-bit inputs and 4-bit output, calculate (9+6) and (9+7) and check whether the results are valid. 


SOLUTION 


The solution is depicted in Figure 3.2. The sum and carry bits (shown within gray areas) were 
obtained using the table of Figure 3.1(b). In the second sum, overflow occurs. 


Last-__»9|/0 00 - Last —>»! 4314 14 - 

carry Carry 

bit 1001 (9) bit S 1001 (9) 
0110 = (6) O)1 ta. 1) 
imeem (15) OK 0000 (0) Not OK 


MSBs MSBs 


FIGURE 3.2. Solutions of Example 3.1. | 
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Case 2 Unsigned addition with AN-bit inputs 
and (N+ 1)-bit output 


In this case, overflow cannot occur because the input range is 0 < input <2’ —1, while the allowed out- 
put range is 0< output =2N*!-1. 


MM EXAMPLE 3.2 UNSIGNED ADDITION #2 
For 4-bit inputs and 5-bit output, repeat the (9 +6) and (9+ 7) additions and check whether or not the 
results are valid. 
SOLUTION 


The solution is depicted in Figure 3.3. Again, sum and carry (shown within gray areas) were obtained 
using the table of Figure 3.1(b). Both results are valid. 


0000 - AA Ae 
1001 () 1001 (9) 
0-1-1 0: ©) ONt Sei (7) 
Ohio (15) OK 10000 (16) OK 
FIGURE 3.3. Solutions of Example 3.2. O 


3.2 Signed Addition and Subtraction 


Subtraction is very similar to addition. Figure 3.4 illustrates how unsigned subtraction can be computed. In 
Figure 3.4(a), a traditional subtraction arrangement, similar to that of Figure 3.1(a), is shown. However, in 
the truth table presented in Figure 3.4(b), the differences between sum and subtraction are made clear where 
borrow is employed instead of carry. Recall that in a binary system all arithmetic operations are modulo-2, so 
when a negative result occurs it is increased by 2 and a borrow-out occurs. An example is shown in Figure 
3.4(c), in which "1001" (=9) minus "0111" (=7) is computed. Following the truth table, "00010" (=2) results. 


borrow-in inputs subtraction borrow-out 
win ab winta—b wout 

Ws W3 W2 Wy — (borrow) 0 O14 slic = 6-1-1. 6 
as a a ap ; : ; 1001 
wv bs bz by bo 7 Oe apd 

_ rae el 0 -1 oe 

S4 S3 S2 S$; Spo (Subtr.) ates S| oO 0 70 

(a) (b) -1 14 -1>1 -1 (c) 


FIGURE 3.4. Three-input binary subtraction: (a) Traditional subtraction assembly; (b) Truth table; (c) Subtrac- 
tion example. 
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The subtraction algorithm described above is only illustrative because the approach used in most 
actual implementations is two’s complement (described in Section 2.5); that is, all negative numbers are 
represented in two’s complement format so they can be added to any other values instead of being sub- 
tracted from them. Therefore, only adders are actually needed (because a—b=a+(-b), where —b is the 
two’s complement of b). As seen in Section 2.5, to obtain the two’s complement of a number, first all bits 
are inverted and then 1 is added to the result. For example, for -6, we start with +6 (="0110"), then invert 
all bits ("1001") and add 1 to the result ("1010" =-6). 


Again, two cases are described: 
Case 1: Signed addition and subtraction with N-bit inputs and N-bit output 
Case 2: Signed addition and subtraction with N-bit inputs and (N+ 1)-bit output 


Case 1 Signed addition and subtraction 
with N-bit inputs and N-bit output 


Recall that signed positive numbers are represented in the same way as unsigned numbers are, with the 
particularity that the MSB must be '0'. If the MSB is '1', then the number is negative and represented in 
two’s complement form. Recalling also that addition and subtraction are essentially the same operation, 
that is, a—b=a+(—b), only one of them (addition) needs to be considered, so the algorithm specified in 
the table of Figure 3.1(b) suffices. 

In summary, to perform the operation a+b, both operands are applied directly to the adder regard- 
less of their actual signs (that is, the fact of being positive or negative does not affect the hardware). The 
same is true for a—b, except that b in this case must undergo two’s complement transformation before 
being applied to the same adder (this transformation must be performed regardless of b’s actual sign). 
The two’s complemented version will be denoted with an asterisk (a* or b*). 

In the overflow criteria described below, we look at the numbers that actually enter the adder, that is, aand 
b when the operation is a+b, a and b* when it is a—b, a* and b when it is —a+, or a* and b* when it is -a—b. 


Overflow check based on carry: If the last two carry bits are different, then overflow has occurred. 


Overflow check based on operands: If the operand’s MSBs are equal and the sum’s MSB is different from 
them, then overflow has occurred. 


The last criterion above says that when two numbers have the same sign, the sum can only have that 
sign, otherwise overflow has occurred. In other words, if both operands are positive (begin with '0'), the 
sum must be positive, and when both are negative (begin with '1'), the sum must be negative as well. 

The first of the two criteria above says the same thing but in a less obvious way. If both operands begin 
with '0', the only way to have the carry-out bit different from the carry-in bit is by having cin='1', which 
produces s='1' and count ='0'; in other words, two positive numbers produce a negative result, which is 
invalid. Likewise, if both numbers begin with '1', for carry-out to be different from carry-in the latter has 
to be cin='0', which then produces s='0' and count ='1'; in other words, two negative numbers produce a 
positive sum, which is again invalid. On the other hand, if the operands have different signs ('0' and '1’), 
then count =cin always, so, as expected, overflow cannot occur. 


MM EXAMPLE 3.3 SIGNED ADDITION #1 


Using signed 4-bit numbers for inputs and output, calculate the following sums and check the 
results’ validity using both overflow check criteria described above: (5+ 2), (5+3), (-5—3), (-5—4), 
(5-2), (5-8). 
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SOLUTION 


The solution is shown in Figure 3.5. Again, the table of Figure 3.1(b) was used to obtain the sum and 
carry bits (shown within gray rectangles). Note that only sums are performed, so when a subtraction 
is needed, the numbers are simply two’s complemented first. 


Last two __s 0o000- Last two __. 0-414 40 = Last two __. 4044) 425- 
carry bits carry bits carry bits 
0;101 (5) zc [Sites (5) ite 
0;0 10 (2) 0011 (3) 1/101 (-3) 
0\1 71 = (7)OK 1.000 (-8) Not OK 1000 (-8)OK 
MSBs MSBs MSBs 
Last two __s, 4010 0 - Last two_. 45410) Oise Last two __. 0000- 
carry bits carry bits carry bits 
ope een FoF || Lat (5) 0);101 (5) 
11100 (4) 1;110 (2) 1:000 (-8) 
0\1114 #£42(7)Not OK 00141 #£(3)OK 1101 (-3)OK 
MSBs MSBs MSBs 
FIGURE 3.5. Solutions of Example 3.3. O 


Case 2 Signed addition and subtraction with N-bit inputs 
and (N+ 1)-bit output 


In this case, overflow cannot occur because the input range is -2N~'< input = 2~'-1, while the 
allowed output range is -2 < output <2’ —1. However, the MSB must be modified as follows. 


MSB check: If the inputs have different signs, then the MSB must be inverted. 


The operation above is due to the fact that when the operands have N bits, but the sum has N +1 bits, 
both operands should be sign extended (see two’s complement representation in Section 2.5) to N+1 bits 
before performing the sum. For the positive number, this is simply the inclusion of a '0' in its MSB position, 
which does not affect the result anyway. However, for the negative number, a'l' is required to keep its value 
and sign. Consequently, when the last carry-out bit is added to the negative extension ('1') it gets inverted 
(that is, the last sum bit is the reverse of the last carry-out bit). 


MM EXAMPLE 3.4 SIGNED ADDITION #2 


Using signed 4-bit inputs and 5-bit output, calculate the following sums and check the results’ 
validity: (7 +7), (-8- 8), (7-8), (-7+7). 


SOLUTION 


The solution is shown in Figure 3.6(a). Again, the table of Figure 3.1(b) was used to obtain the sum 
and carry bits (shown within gray rectangles). For the MSB, the rule described above was applied 
(indicated by an arrow in the figure). A second solution is depicted in Figure 3.6(b), in which sign 
extension was used. 
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Latt—>0'111 - Lat—>1000- Geet Fay ait 1000- 

carry carry 

bit eyottt (7) bit J 1000 (-8) gira (7) ge tees (-8) 
0111 = (7) 1000 (-8) 00111 = (7) 11000 (8) 
01110 (14)OK 10000 (-16)OK 01110 (14,0K 10000 (-16)0K 

Last» 0000 - Last sf 4]]4 4 4 = 0000- aA AA 

carry carry 

bit MELEE (7) bit J 1001 (-7) OO Ee (7) 11607 CH 
1000 (8) 0111 (7) 11000 (8) 00111 (7) 
Tete (-1) OK 00000 (0)OK 41111 (1)OK 00000 (0)OK 

(a) (b) 
FIGURE 3.6. Solutions of Example 3.4 using (a) the rule described in Case 2 or (b) sign extension. i 


3.3. Shift Operations 


The three main shift operations are logical shift, arithmetic shift, and circular shift (rotation). 


Logical shift 

The binary word is shifted to the right or to the left a certain number of positions; the empty positions are 
filled with '0's. In VHDL, this operator is represented by SRL n (shift right logical n positions) or SLL n 
(shift left logical n positions). 


Examples: 


"01011" SRL 2="00010" (illustrated in Figure 3.7(a)) 
"10010" SRL 2="00100" (illustrated in Figure 3.7(b)) 
"11001" SLL 1="10010" 

"11001" SRL -1="10010" 


Arithmetic shift 

The binary vector is shifted to the right or to the left a certain number of positions. When shifted to 
the right, the empty positions are filled with the original leftmost bit value (sign bit). However, when 
shifted to the left, there are conflicting definitions. In some cases, the empty positions are filled with '0's 
(this is equivalent to logical shift), while in others they are filled with the rightmost bit value. In VHDL, 
the latter is adopted, so that is the definition that we will also adopt in the examples below. The VHDL 
arithmetic shift operator is represented by SRA n (shift right arithmetic n positions) or SLA n (shift left 
arithmetic 1 positions). 


Examples: 


10110 SRA 2=11101 
10011 SLA 1=00111 
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—_ ——_> 
BOP BROOD OS Ro See 


(a 
———_ ——_> 
aa Bae: Beaee 


FIGURE 3.7. Logical shift two positions to the right of (a) "01011" and (b) "10010". 


Circular shift (Rotation) 


This case is similar to logical shift with the only difference that the empty positions are filled with the 
removed bits instead of '0's. In VHDL, this operator is represented by ROR n (rotate right 1 positions) or 
ROL n (rotate left 1 positions). 


Examples: 


00110 ROR 2=10001 
11010 ROL 1=10101 


Shift operations can be used for division and multiplication as described below. 


Division using logical shift 

For unsigned numbers, a logical shift to the right by one position causes the number to be divided by 2 
(the result is rounded down when the number is odd). This process, of course, can be repeated k times, 
resulting in a division by 2". 


Examples: 


00110 (6) 00011 (3) 
11001 (25) 01100 (12) 


Division using arithmetic shift 


For signed numbers, an arithmetic shift to the right by one position causes the number to be divided by 
2. The magnitude of the result is rounded down when the number is odd positive or rounded up when 
it is odd negative. 


Examples: 


01110 (14) 00111 (7) 
00111 (7) 00011 (3) 
10000 (-16) > 11000 (-8) 
11001 (-7) > 11100 (—4) 


Multiplication using logical shift 

A logical shift to the left by one position causes a number to be multiplied by 2. However, to avoid over- 
flow, the leftmost bit must be '0' when the number is unsigned, or the two leftmost bits must be "00" or 
"11" when it is signed positive or signed negative, respectively. 
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Examples with unsigned numbers: 
01110 (14) > 11100 (28) Correct 
11010 (26) > 10100 (20) Incorrect (overflow) 


Examples with signed numbers: 

00111 (7) + 01110 (14) Correct 

01110 (14) > 11100 (—4) Incorrect (overflow) 
11000 (—8) — 10000 (—16) Correct 

10000 (—16) 00000 (0) Incorrect (overflow) 


Multiplication using logical shift and a wider output 


When performing multiplication, it is common to assign to the output signal twice the number of bits as 
the input signals (that is, 2N bits when the inputs are N bits wide). In this case, overflow can never occur 
even if up to N shifts occur (that is, if the number is multiplied by up to 2). The new N bits added to the 
original binary word must be located on its left side and initialized with '0's if the number is unsigned, or 
with the sign bit (MSB) if it is signed. 


Examples with 4-bit unsigned numbers and 4 shifts: 

0011 (original =3) 0000 0011 (doubled =3) + 0011 0000 (shifted =48) Correct 

1000 (original =8) 0000 1000 (doubled = 8) — 1000 0000 (shifted = 128) Correct 
Examples with 4-bit signed numbers and 4 shifts: 

0011 (original =3) 0000 0011 (doubled =3) + 0011 0000 (shifted = 48) Correct 

1000 (original=—8) — 1111 1000 (doubled =—8) — 1000 0000 (shifted =—128) Correct 
1111 (original=-1) > 1111 1111 (doubled =-1) > 1111 0000 (shifted =—16) Correct 


3.4 Unsigned Multiplication 


Similarly to addition, multiplication can also be unsigned or signed. The former case is presented in this 
section, while the latter is shown in the next. In both, N bits are employed for the inputs, while 2N bits 
are used for the output (this is the case in any regular multiplier). 

Figure 3.8(a) depicts the well-known multiplication algorithm between two unsigned numbers, where 
A=AzAyM,dq is the multiplier, b= b,b,b,b is the multiplicand, and p=p7... ppp is the product. Because the 
output is 2N bits wide, overflow cannot occur. 


MM EXAMPLE 3.5 UNSIGNED MULTIPLICATION 
Multiply the unsigned numbers "1101" and "1100" using the algorithm of Figure 3.8(a). 


SOLUTION 


The multiplication is shown in Figure 3.8(b), where "1101" (=13) is multiplied by "1100" (=12), 
producing "10011100" (=156). 
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bs bz by; Bo < Multiplicand 


x A) SRO (13) 
@3 42 @ 8 < Multiplier 1100 (42) 
aob3 acb2 aod; ado 0000 
ae ayb3 ayb2 ayb; aybo _ Partial 1 : 4 : : 
azb3 azb2 ab; azbo products 1101 
asb3 a3b2 a3b; asDo 
2 RE a 8 10011100 (156) 
P7 Pe Ps Ps Ps P2 Pi Po ¢ Product 
(a) (b) 


FIGURE 3.8. (a) Unsigned multiplication algorithm; (b) Solution of Example 3.5. 


Any multiplication algorithm is derived from the basic algorithm depicted in Figure 3.8(a). For 
example, most dedicated hardware multipliers implemented from scratch (at transistor- or gate- 
level) are a straight implementation of that algorithm (see Section 12.9). However, multiplication 
can also be performed using only addition plus shift operations. This approach is appropriate, for 
example, when using a computer to do the multiplications because its ALU (arithmetic logic unit, 
Section 12.8, which is at the core of any processor) can easily do the additions while the control unit 
can easily cause the data registers to be shifted as needed. To distinguish this kind of approach from 
those at the transistor- or gate-level, we will refer to the former as ALU-based algorithms. 

An example of ALU-based unsigned multiplication is illustrated in Figure 3.9. The multiplicand 
and multiplier are "1001" (=9) and "1100" (=12), respectively. Because the inputs are 4 bits wide, 
4 iterations are needed, and the product register must be 8 bits wide to prevent overflow. Initially, 
the right half of the product register is filled with the multiplier and the left half with '0's. The algo- 
rithm is then the following: If the rightmost product bit is '1', then the multiplicand is added to the 
left half of the product, then the product register is shifted to the right one position; however, if the 
bit is '0', then only the shift operation must occur. The type of shift is logical (Section 3.3). Applying 
this 4 times, "01101100" (= 108) results. 


Product 
Left Right 


0000 110 0 


0000 1100 
0000 011 0 


Bit=1 > ProdLeft + Multiplicand 
Shift right logic 
Bit=1 > ProdLeft + Multiplicand 
Shift right logic 


1001 0011 
0100 100 1 
1101 1001 
0110 1100 


FIGURE 3.9. Unsigned multiplication algorithm utilizing only addition and logical shift. 


0000 0110 
0000 001 1 
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3.5 Signed Multiplication 


Algorithms for signed multiplication are also derived from the basic algorithm shown in Figure 3.8(a). 
An example, which is useful for hardware multipliers designed from scratch (at transistor- or gate-level), 
is illustrated in Figure 3.10 (modified Baugh-Wooley multiplier). Comparing Figure 3.10 to Figure 3.8(a), we 
observe that several most-significant partial products were inverted in the former and also that two '1's 
were included along the partial products (shown within gray areas). These little changes can be easily 
incorporated into the hardware of the same circuit that implements the algorithm of Figure 3.8(a), hence 
resulting in a programmable signed /unsigned multiplier. 


bs be by bo 
a3 a a ao 
1) acbs acb2 acb; aobo 
+ aib3 aib2 ab; a;bo 
aabs asb2 arb; asbo 


qf) asbs @ab2 a3b; Asbo 


P7 Pe Ps Ps Ps P2 Pi Po 


FIGURE 3.10. Signed multiplication algorithm. 


MM EXAMPLE 3.6 SIGNED MULTIPLICATION 
Multiply the signed numbers "1101" and "1100". 


SOLUTION 


This multiplication is shown in Figure 3.11. "1101" (—3) and "1100" (—4) are negative numbers, so the 
algorithm of Figure 3.10 can be employed. To better visualize the process, first the regular partial 
products are shown (on the left of Figure 3.11), then the appropriate partial products are inverted 
and the '1's are included, producing "00001100" (+12) at the output. 


ieeeoman (—3) fmcmm (—3) 
immemen (-4) ieee (-4) 
0000 11000 
0000 — 1000 
AY At 20a 8 | 
5, MUO i PO: v6 
00001 


FIGURE 3-11. Solution of Example 3.6. O 
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: =a: Product 
Iteration | Procedure Multiplicand Left Right Extra 

0 Initialization (Multiplier loaded into 
ProdRight) 

1 Bits=00 — No operation 0 
Shift right arith 0 

2 Bits=10 — ProdLeft — Multiplicand 01110 00111 0O 
Shift right arith 00111 00011 1 

3 Bits=11 — No operation 00111 00011 1 
Shift right arith 00011 10001 1 

4 Bits=11 — No operation 00011 10001 1 
Shift right arith 00001 11000 1 

5 Bits=01 — ProdLeft + Multiplicand 10011 11000 
Shift right arith 11001 11100 


FIGURE 3.12. Booth’s algorithm for signed multiplication (with addition/subtraction and arithmetic shift). 


For signed multiplication, ALU-based algorithms (see comments in Section 3.4) also exist. Booth’s 
algorithm, illustrated in Figure 3.12, is a common choice. The multiplier and multiplicand are "01110" 
(+14) and "10010" (—14), respectively. Because the inputs are 5 bits wide, five iterations are needed, and 
the product register must be 10 bits wide to prevent overflow. Upon initialization, the multiplier is 
loaded into the right half of the product register, while the other positions are filled with '0's, including 
an extra '0' on the right of the product. The algorithm is then the following: If the two rightmost bits are 
"10", the multiplicand must be subtracted from the left half of the product register, and the result must be 
arithmetically shifted (Section 3.3) to the right; if the bits are "01", then the multiplicand must be added 
to the left half of the product register and the result arithmetically shifted to the right; finally, if the bits 
are "00" or "11", then only arithmetic shift to the right must occur. Applying this procedure five times, 
"1100111100" (—196) results in Figure 3.12. 


3.6 Unsigned Division 


Similarly to multiplication, division can also be unsigned or signed. The former case is presented in this 
section, while the latter is shown in the next. If N bits are used to represent the inputs, then a total of 2N 
bits are again needed to represent the outputs (quotient and remainder). 

Figure 3.13 shows the well-known division algorithm, where "1101" (= 13) is the dividend, "0101" (=5) 
is the divisor, "0010" (=2) is the quotient, and "0011" (=3) is the remainder. Note that for the quotient 
to have the same number of bits as the dividend, the latter was filled with '0's in the most significant 
positions until a total of 2N—1 bits resulted, where N is the size of the divisor (N=4 in this example). 

From a hardware perspective, dedicated dividers are more difficult to implement than dedicated mul- 
tipliers. For that reason, the most common approach is to employ ALU-based algorithms (see comments in 
Section 3.4), which make use of only addition/subtraction plus shift operations. 

Analgorithm of this kind is illustrated in Figure 3.14, where "1101" (=13) is the dividend and "0101" (=5) 
is the divisor. Because these numbers are 4 bits long, the number of iterations is 4 and the remainder reg- 
ister must be 8 bits long. During initialization, the dividend is loaded into the right half of the remainder 
register, and then the whole vector is shifted one position to the left with a '0' filling the empty (rightmost) 
position. The algorithm is then the following: The divisor is subtracted from the left half of the remainder; 
if the result is negative, then the divisor is added back to the remainder, restoring its value, and a left shift 
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0010 < Quotient 


Divisor> 0101/]0001101 < Dividend 
0101 


0011 «< Remainder 


FIGURE 3.13. Unsigned division algorithm. 


Remainder 
Left Right 
0 Initialization (Dividend is loaded into RemRight) 0101 0000 1101 | 
Shift Rem left with ‘0’ in empty position 0001 1010 
RemLeft — Divisor 1,100 1010 
Bit=1 — RemLeft + Divisor 0001 1010 
Bit=1 > Shift Rem left with ‘0’ 0011 0100 
2 RemLeft — Divisor 1/110 0100 
Bit=1 > RemLeft + Divisor 0011 0100 | 


Iteration | Procedure Divisor 


ad 


Bit=1 — Shift Rem left with ‘0’ 0110 1000 


RemLeft — Divisor 0 |001 1000 
Bit=0 + No operation 0001 1000 
Bit=0 > Shift Rem left with ‘1’ 0011 0001 


1.110 0001 


RemLeft — Divisor 


Bit=1 — RemLeft + Divisor 0011 0001 
Bit=1 > Shift Rem left with ‘0° 0110 0010 
(*) Shift RemLeft to the right with ‘0’ 0011 0010 


(*) After the last iteration the left half of the remainder must be shifted to the right. 


FIGURE 3.14. Unsigned division algorithm utilizing only addition/subtraction and shift operations. 


is performed, again with a '0' filling the empty position; otherwise, if the result is positive, the remainder 
register is shifted left with a '1' filling the empty position. After the last iteration, the left half of the remain- 
der register must be shifted to the right with a '0' filling the empty position. The quotient then appears in 
the right half of the remainder register, while the actual remainder appears in its left half. As can be seen 
in Figure 3.14, in this example the results are quotient="0010" (=2) and remainder="0011" (=3). 


3.7 Signed Division 


Signed division is normally done as if the numbers where unsigned. This implies that negative numbers 
must first undergo two’s complement transformation. Moreover, if the dividend and the divisor have 
different signs, then the quotient must be negated (two’s complemented), while the remainder must 
always carry the same sign as the dividend (in other words, if the dividend is negative, then the remain- 
der must also undergo two’s complement transformation to convert it into a negative number). 
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MM EXAMPLE 3.7 SIGNED DIVISION 
Divide "1001" (-7) by "0011" (+3). 


SOLUTION 


The two’s complement of —7 is 7, that is, "0111". Applying then the division algorithm, we obtain 
quotient="0010" (2) and remainder="0001" (1). However, because the dividend and divisor have 
different signs, the quotient must be negated, hence resulting quotient="1110" (—2). Moreover, the 
remainder must have the same sign as the dividend, which in this example is negative. Therefore, 
after two’s complement transformation, remainder="1111" (—1) results. Ml 


3.8 Floating-Point Addition and Subtraction 


Binary floating-point (FP) representation (for reals) was described in Section 2.6, and it is summarized 
in Figure 3.15. The option in Figure 3.15(a), called single-precision, contains 32 bits, with one bit for the 
sign (S), 8 bits for the exponent (E£), and 23 bits for the fraction (F). The option in Figure 3.15(b), called 
double-precision, contains 64 bits, with one bit for the sign (S), 11 bits for the exponent (E), and 52 bits for 
the fraction (F). The corresponding decimal value (y) in each case is determined as follows. 


Single precision: 


y=(-1P (1+ F277 (3.1) 
Where, for normalized numbers: 
1=F=254 or-126Se<127 (3.2) 
Double precision: 
y=(-1)(1 + F)2% 103 (3.3) 
Where, for normalized numbers: 
1 =E=2046 or -1022 = e= 1023 (3.4) 


In the equations above, E is the biased exponent, while e is the actual exponent, that is, e=E-—127 for 
single-precision or e= E—1023 for double-precision. 

We now describe how computers perform arithmetic operations using floating-point numbers. We 
start with addition/subtraction in this section then multiplication and division in the succeeding two 
sections. 


2’ Pag 2° he Pig 2? 2! an 5-1 fa Yad Yad *, a 73 


OM: ee a ae 


1 bit 8 bits 23 bits 


g 3° 2 a! 3 9 3 2 rag 9) 2° 5-1 3- id a4 93! 9% 


1 bit 11 bits 52 bits 


FIGURE 3.15. (a) Single- and (b) double-precision floating-point representations (IEEE 754 standard). 
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Floating-point addition/subtraction algorithm 


To add two positive or negative FP numbers, the four-step procedure below can be used. (Recall from 
Section 3.2 that addition and subtraction are essentially the same operation with only two’s complement 
needed to convert one into the other.) 


Step 1: If the exponents are different, make them equal by shifting the significand of the number with 
the smallest exponent to the right. 


Step 2: Add the complete significands. However, if a value is negative, enter its two’s complement in 
the sum. Note that 2 bits result on the left of the binary point. If the resulting MSB is 1, then the 
sum is negative, so S=1 and the result must be two’s complemented back to a positive value 
because in floating-point notation a negative number is represented with exactly the same bits 
as its positive counterpart (except for the sign bit). 


Step 3: Normalize the result (scientific notation) and check for exponent overflow (exponent inside 
the allowed range). 


Step 4: Truncation (with rounding) might be required if the fraction is larger than 23 bits or if it must 
be sent to a smaller register. To truncate and round it, suppress the unwanted bits, adding 1 
to the last bit if the first suppressed bit is 1. Renormalize the result, if necessary, and check for 
exponent overflow. 


MM EXAMPLE 3.8 FLOATING-POINT ADDITION 


Compute the single-precision FP addition 0.75+2.625. Assume that the final result must be repre- 
sented with only 2 fraction bits. 


SOLUTION 


Floating-point representation for 0.75: 1.1-27' (obtained in Example 2.9). 
Floating-point representation for 2.625: 1.0101-2! (obtained in Example 2.9). 
Now we can apply the 4-step procedure described above. 


Step 1: The smallest exponent must be made equal to the other: 1.1-27'=0.011 -2'. 


Step 2: Add the significands (with 2 bits on the left of the binary point): 1.0101+0.011=01.1011 
(MSB =0, so this result is positive). Thus 0.75 +2.625=1.1011- 2h 


Step 3: The result is already normalized and the exponent (e=1) is within the allowed range (—126 
to 127) given by Equation (3.2). 

Step 4: The fraction must be truncated to 2 bits. Because its third bit is 1, a 1 must be added to the last 
bit of the truncated significand, that is, 1.10+0.01=1.11. 
The final (truncated and normalized) result then is 0.75 +2.625=1.11-2". 
From 1.11 x2’, the following single-precision FP representation results: 
S='0', F="11", and E="10000000" (because E=e+ 127=1+127=128). 


EXAMPLE 3.9 FLOATING-POINT SUBTRACTION 
Compute the single-precision FP subtraction 0.75 — 2.625. 
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SOLUTION 


Floating-point representation for 0.75: 1.1-27' (obtained in Example 2.9). 
Floating-point representation for 2.625: 1.0101 -2! (obtained in Example 2.9). 
Now we can apply the 4-step procedure described above. 


Step 1: The smallest exponent must be made equal to the other: 1.1-27'=0.011 -2!. 


Step 2: Now we must add the significands. However, because one of the numbers (—2.625) is 
negative, it must be two’s complemented first: 


1.0101 complement and add one —> (0.1010 + 0.0001) = 0.1011 
The addition of the significands then produces (with 2 bits on the left of the binary point): 
0.011 + 0.1011 = 11.0001 (recall MSB check rule seen in Case 2 of Section 3.2). 
Because the MSB is 1, the result is negative, so S=1, and it must be two’s complemented back 
to a positive value: 
11.0001 — (0.1110 + 0.0001) = 0.1111 
Therefore, 0.75 — 2.625=—0.1111 -2'. 
Step 3: The result above requires normalization: 
—0.1111 -2'=-1.111 - 2° (the exponent is still within the allowed range). 
From -1.111 - 2°, the following single-precision FP representation results: 
S='1', F="111000...000", and E="01111111" (because E=e+127=0+127=127). 


3.9 Floating-Point Multiplication 
To multiply two (positive or negative) floating-point numbers, the four-step procedure below can be used. 


Step 1: Add the actual exponents and check for exponent overflow. 

Step 2: Multiply the significands and assign the proper sign to the result. 

Step 3: Normalize the result, if necessary, and check again for exponent overflow. 
Step 4: Truncate and round the result, if necessary (same as Step 4 of addition). 


MM EXAMPLE 3.10 FLOATING-POINT MULTIPLICATION 
Using single-precision FP, compute (0.75) x (—2.625). Truncate/round the fraction to 3 bits. 


SOLUTION 


Floating-point representation for 0.75: 1.1-27' (obtained in Example 2.9). 
Floating-point representation for 2.625: 1.0101 -2' (obtained in Example 2.9). 
Now we can apply the 4-step procedure described above. 


Step 1: The resulting exponent is -1+1=0 (within the allowed range). 
Step 2: The multiplication of the significands produces (1.1) x (1.0101) = 1.11111, and the sign is ‘~’. 


Hence the product is (0.75) x (-2.625) =—-1.11111 - 2°. 
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Step 3: The result above is already normalized. 


Step 4: The fraction above has 5 bits and must be reduced to 3. Because the 4" bit is 1, a 1 must be 
added to the last bit of the truncated significand, that is, 1.111+0.001=10.000. Renormaliza- 
tion is now needed, so the final result is (0.75) x (-2.625) =—1.000- 21. 
From -1.000 -2', the following single-precision FP representation results: 


S='1', F="000", and E="10000000" (E=e+127=1+127=128). 


EXAMPLE 3.11 TRUNCATION ERROR 


Convert the result obtained in Example 3.10 back to decimal to calculate the error introduced by the 
truncation. 


SOLUTION 


Exact result: (0.75) x (—2.625) =—1.96875 

Truncated result (from Example 3.10): (0.75) x (-2.625) =—1.000-2! 

Decimal value for the truncated result (Equation (3.1): y=(-1)'(1+0)2'=-2 
Error =(| 1.96875—2 | /1.96875) x 100=1.59% 


3.10 Floating-Point Division 


To divide two (positive or negative) floating-point numbers, the four-step procedure below can be used. 


Step 1: Subtract the exponents and check for overflow. 

Step 2: Divide the significands and assign the proper sign to the result. 

Step 3: Normalize the result, if necessary. 

Step 4: Truncate and round the result, if necessary (same as Step 4 of addition). 


MM EXAMPLE 3.12 FLOATING-POINT DIVISION 


Compute the division (0.75) + (—2.625) using single-precision floating-point numbers. 


SOLUTION 

Floating-point representation for 0.75: 1.1-27' (obtained in Example 2.9). 
Floating-point representation for 2.625: 1.0101 -2' (obtained in Example 2.9). 
Now we can apply the 4-step procedure described above. 


Step 1: The resulting exponent is —1—1=~—2 (within the allowed range). 
Step 2: The division of the significands produces (1.1) + (1.0101) =1.001001001 ..., and the sign is ‘—’. 


Hence the result is —1.001001001...-2>. 
From —1.001001001...-2~?, the following single-precision FP representation results: 
S='1', F="001001001...", and E="01111101" (because E=e+127=-2+127=125). 
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11. ‘Exercises 


. Unsigned addition #1 


Consider an adder with unsigned 5-bit inputs and 5-bit output. 
a. Determine the (decimal) range of each input and of the output. 


b. Representing all signals using regular binary code, calculate 16 + 15 and check whether the result 
is valid. 


c. Repeat part (b) above for 16+16. 


. Unsigned addition #2 


Repeat Exercise 3.1 assuming that the output is 6 bits wide. 


. Unsigned addition #3 


Given the unsigned values a="111101", b="000001", and c="100001", and knowing that x, y, and z 
are also unsigned 6-bit numbers, calculate and check the result of: 


a. x=at+b 
b. y=atc 


c. z=b+c 


. Signed addition #1 


Consider an adder with signed 5-bit inputs and 5-bit output. 
a. Determine the (decimal) range of each input and of the output. 


b. Using binary vectors, with negative numbers in two’s complement format, calculate -8—8 and 
check whether the result is valid. 


c. Repeat part (b) above for —-8—9. 
d. Repeat part (b) above for 8—9. 
e. Repeat part (b) above for 8+8. 


. Signed addition #2 


Repeat Exercise 3.4 assuming that the output is 6 bits wide. 


. Signed addition #3 


Given the signed values a="111101", b="000001", and c="100001", and knowing that x, y, and z are 
also signed 6-bit values, calculate and check the result of: 


a. x=at+b 
b. y=at+c 


c z=b+c 
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7. 


10. 


11. 


Logical shift #1 

Write the resulting vectors when the logical shift operations below are executed. 
"11010" SRL 2 

"01011" SRL -3 

ec. "10010" SLL 2 

d. "01011" SLL3 


Sa 


. Logical shift #2 


Write the resulting vectors when the logical shift operations below are executed. 
"111100" SRL 2 

b. "010000" SRL 3 

ce. "111100" SLL =2 

d. "010011" SLL3 

Arithmetic shift #1 

Write the resulting vectors when the arithmetic shift operations below are executed. 
"11010" SRA 2 

b. "01011" SRA -3 

ec. "10010" SLA 2 

d. "01011" SLA3 

Arithmetic shift #2 

Write the resulting vectors when the arithmetic shift operations below are executed. 
"111100" SRA 2 

b. "010000" SRA 3 

ec. "111100" SLA -2 

d. "010011" SLA 1 

Circular shift #1 

Write the resulting vectors when the circular shift operations below are executed. 
"11010" ROL 1 

b. "01011" ROL -3 

c. "10010" ROR 2 

d. "01011" ROR 3 
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Circular shift #2 

Write the resulting vectors when the circular shift operations below are executed. 
a. "111100" ROL2 

b. "010000" ROL 3 

ec. "111100" ROR -2 

d. "010011" ROR 3 

Shift x unsigned multiplication 


Logically shift each of the unsigned numbers below one position to the left and check whether the 
number gets multiplied by 2. Are there any restrictions (overflow) in this case? 


a. "001111" 
b. "010001" 
c. "110011" 
Shift x signed multiplication 


Arithmetically shift each of the signed numbers below one position to the left and check whether the 
number gets multiplied by 2. Are there any restrictions (overflow) in this case? 


a. "001111" 


b. "010001" 
ce. "110011" 
d. "100001" 


Shift x unsigned division 


Logically shift each of the unsigned numbers below two positions to the right and check whether the 
number gets divided by 4. Are there any restrictions in this case? 


a. "001100" 
b. "000110" 
ec. "111101" 


Shift x signed division 


Arithmetically shift each of the signed numbers below two positions to the right and check whether 
the number gets divided by 4. Are there any restrictions in this case? 


a. "001100" 
b. "000110" 
ec. "111101" 
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Unsigned multiplication #1 

Using the multiplication algorithm of Figure 3.8(a), multiply the following unsigned 5-bit numbers: 
a. 7x31 

b. 14x16 

Unsigned multiplication #2 

Repeat the exercise above using the unsigned multiplication algorithm of Figure 3.9. 

Signed multiplication #1 

Using the multiplication algorithm of Figure 3.10, multiply the following signed 5-bit numbers: 
a. -6x14 

b. —16x8 

Signed multiplication #2 

Repeat the exercise above using Booth’s algorithm for signed multiplication (Figure 3.12). 
Unsigned division 

Using the division algorithm of Figure 3.14, divide the following unsigned 5-bit numbers: 

a. 3137 

b. 1674 

Signed division 


Using the division algorithm of Figure 3.14, plus the description in Section 3.7, divide the following 
signed 5-bit numbers: 


a. 14+-6 

b. -16+-3 

Floating-point addition/subtraction #1 

a. Show that the single-precision floating-point representations for 1 and 0.875 are as follows. 
For 1: S='0', F="00...0" (=0), E="01111111" (=127). 
For 0.875: S='0', F="1100...0" (=0.75), E="01111110" (=126). 
Now, using the procedure described in Section 3.8, calculate the following: 

b. 1+0.875 

ec. —1+0.875 

Floating-point addition/subtraction #2 


Using single-precision floating-point representations and the procedure described in Section 3.8, 
determine: 


a. 12.5-0.1875 
b. 4.75425 
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Floating-point addition/subtraction #3 


Using single-precision floating-point representations and the procedure described in Section 3.8, 
determine: 


a. 8.125-8 
b. —19-—32.0625 


Floating-point multiplication #1 


a. Show that the single-precision floating-point representations for 4.75 and 25 are as follows. 


For 4.75: S='0', F="00110...0" (=0.1875), E="10000001" (=129). 
For 25: S='0', F="10010...0" (=0.5625), E="10000011" (= 131). 
Now, using the procedure described in Section 3.9, calculate the following products: 


b. 4.75 x25 
ec. (—4.75) x25 
Floating-point multiplication #2 


Using single-precision floating-point representations and the procedure described in Section 3.9, 
determine the products below. The result should be truncated (and rounded) to 3 fraction bits (if 
necessary). 


a. 8.125x(-8) 
b. (19) x (-12.5) 
Floating-point division #1 


Using single-precision floating-point representations and the procedure described in Section 3.10, 
determine the ratios below. 


a. 4.75425 
b. 8.125+(-8) 
Floating-point division #2 


Using single-precision floating-point representations and the procedure described in Section 3.10, 
determine the ratios below. The result should be truncated (and rounded) to 3 fraction bits (if necessary). 


a. (—4.75) +(-0.1875) 
b. (—19) +(-12.5) 
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Introduction to Digital 
Circuits 


Objective: This chapter introduces the most fundamental logic gates and also the most fundamental 
of all registers along with basic applications. Moreover, to make the presentations more effective and 
connected to the physical implementations, the corresponding electronic circuits, using CMOS archi- 
tecture, are also included. Even though many other details will be seen later, the concepts selected for 
inclusion in this chapter are those absolutely indispensable for the proper understanding and apprecia- 
tion of the next chapters. 
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4.1. Introduction to MOS Transistors 


As mentioned in the introduction, the electronic circuits for the gates and register described in this chapter 
will also be presented, which requires some knowledge about transistors. Because MOS transistors are 
studied later (Chapter 9), a brief introduction is presented here. 

Almost all digital circuits are constructed with a type of transistor called MOSFET (metal oxide semi- 
conductor field effect transistor), or simply MOS transistor. Indeed, there are two types of MOS transistors, 
one called n-channel MOS (or simply nMOS) because its internal channel is constructed with an n-type 
semiconductor, and the other called p-channel MOS (or simply pMOS) because its channel is of type p. 

Regarding their operation, in simple words it can be summarized as follows: An nMOS transistor is 
ON (emulating a closed switch) when its gate voltage is high, and it is OFF (open switch) when its gate 
voltage is low; reciprocally, a pMOS transistor is ON (closed switch) when its gate voltage is low, and it 
is OFF (open switch) when it is high. 
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The operation principle mentioned above is illustrated in Figure 4.1. In Figure 4.1(a), the basic 
circuit is shown, with the transistor’s source terminal (S) connected to the ground (0 V, represented by a 
triangle or GND), its drain (D) connected through a resistor to the power supply (represented by Vpp, 
which is 5 V in this example), and its gate (G) connected to the input signal source (x). The situation for 
x=O0V is depicted in Figure 4.1(b). Because a low gate voltage turns the nMOS transistor OFF, no electric 
current flows through the resistor, causing the output voltage to remain high (in other words, when 
x='0', y='1)); this situation is further illustrated by means of an open switch on the right of Figure 4.1(b). 
The opposite situation is shown in Figure 4.1(c), now with the transistor’s gate connected to 5 V. Because 
a high gate voltage turns the nMOS transistor ON, electric current now flows through R, thus lowering 
the output voltage (in other words, when x='1', y='0'); this situation is further illustrated by means of a 
closed switch on the right of Figure 4.1(c). 

A similar analysis for a pMOS transistor is presented in Figure 4.2 (note the little circle at the tran- 
sistor’s gate, which differentiates it from the nMOS type). In Figure 4.2(a), the basic circuit is shown, 
with the transistor’s source terminal (S) connected to Vpp, the drain (D) connected through a resistor 
to ground, and its gate (G) connected to the input signal source (x). The situation for x=5 V is depicted 
in Figure 4.2(b). Because a high gate voltage turns the pMOS transistor OFF, no electric current flows 
through the resistor, causing the output voltage to remain low (in other words, when x='1', y='0'); this 
situation is further illustrated by means of an open switch on the right of Figure 4.2(b). The opposite 
situation is shown in Figure 4.2(c), now with the transistor’s gate connected to 0 V. Because a low gate 
voltage turns the pMOS transistor ON, electric current now flows through R, thus rising the output volt- 
age (in other words, when x='0', y='1’); this situation is further illustrated by means of a closed switch 
on the right of Figure 4.2(c). 

The operation of both nMOS and pMOS transistors is further illustrated in Figure 4.3, which shows a 
square waveform (x) applied to their gates. Note that in both cases the output (y) is always the opposite 
of the input (when one is high, the other is low, and vice versa), so this circuit is an inverter. Note also 
that some time delay between x and y is inevitable. 
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FIGURE 4.1. Digital operation of an nMOS transistor. 
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FIGURE 4.3. Further illustrations of the digital behavior of (a) nMOS and (b) pMOS transistors. 


4.2 Inverter and CMOS Logic 


Having seen how MOS transistors work, we can start discussing the structures of fundamental digital 
gates. However, with MOS transistors several distinct architectures can be devised for the same gate 
(which will be studied in detail in Chapter 10). Nevertheless, there is one among them, called CMOS, 
that is used much more often than any other, so it will be introduced in this chapter. 

CMOS stands for complementary MOS because for each nMOS transistor there also is a pMOS one. 
As will be shown, this arrangement allows the construction of digital circuits with very low power con- 
sumption, which is the main reason for its huge popularity. For each gate described in this and following 
sections, the respective CMOS circuit will also be presented. 


4.2.1 Inverter 


The inverter is the most basic of all logic gates. Its symbol and truth table are depicted in Figure 4.4. It 
simply complements the input (if x='0', then y='l', and vice versa), so its logical function can be repre- 
sented as follows, where some properties are also shown. 
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Inverter function: 
y=x' (4.1) 
Inverter properties: 
0’=1, 1’=0, (x’)’=x (4.2) 


The inverter is not only the most basic logic gate, but it is also physically the simplest of all logic 
gates. Indeed, both circuits seen in Figures 4.1-4.2 (or 4.3) are inverters because both implement 
the function y=x’. However, those circuits exhibit major limitations for use in digital systems, like 
excessive power consumption. For example, observe in Figure 4.1(c) that, during the whole time in 
which x='1', electric current flows through the circuit, so power is consumed, and the same occurs in 
Figure 4.2(c) for x='0'. Another major limitation is the large silicon space needed to construct the resistor. 
In summary, that type of inverter, though fine for analog (linear) systems, is not appropriate for digital 
applications. As mentioned earlier, in digital circuits the most common architecture is CMOS, which is 
described below. 


4.2.2 CMOS Logic 


An inverter constructed using the CMOS logic architecture is shown in Figure 4.5(a). Comparing it to 
the inverters of Figures 4.14.2, we observe that it employs two (complementary) transistors instead of 
one and that resistors are not required. Because a CMOS gate requires nMOS and pMOS transistors, the 
CMOS inverter is the simplest CMOS circuit because it contains only one transistor of each type. 

The operation of the CMOS inverter is illustrated in Figures 4.5(b)-(c). In Figure 4.5(b), x='0' (OV) 
is applied to the input, which causes the pMOS transistor to be ON and the nMOS to be OFF (repre- 
sented by a closed and an open switch, respectively). Thus, the upper switch connects y to Vpp, yielding 
y='l', while the lower switch, being open, guarantees the inexistence of static electric current through 
the circuit (from Vpp to GND). The opposite situation is shown in Figure 4.5(c), that is, x='1' (5 V in this 
example) is applied to the input, causing the nMOS to be ON and the pMOS to be OFF (again represented 
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FIGURE 4.4. Inverter symbol and truth table. 
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FIGURE 4.5. (a) CMOS inverter; Operation with (b) x='0' and (c) x='1'. 
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by a closed and an open switch, respectively). This time the lower switch connects y to GND, yielding 
y='0', while the upper switch, being open, guarantees the inexistence of static electric current from Vpp 
to GND. 


4.2.3 Power Consumption 


Power consumption is a crucial specification in modern digital systems, particularly for portable (battery- 
operated) devices. It is classified as static or dynamic, where the former is the power consumed while the 
circuit remains in the same state, and the latter is the power consumed when the circuit changes its state. 
The total power consumption (P,) is therefore given by: 


Pr= Petatic + P syria (4.3) 


The inverter seen in Figure 4.1 is an example of circuit that consumes static power. Note in Figure 4.1(c) 
that during the whole time while x='1' a drain current (I) flows through the circuit (from Vpp to GND), 
hence causing a static power consumption P..44-= Vpp:Ip. For example, if Vpp=5 V and Ip=1mA, then 
5mW of static power results. The same reasoning is valid for the inverter of Figure 4.2. 

Contrary to the inverters mentioned above, CMOS circuits (Figure 4.5, for example) practically do 
not consume static power (which is one of the main attributes of CMOS logic) because, as seen in Figure 
4.5, one of the transistor sections (NMOS or pMOS) is guaranteed to be OFF while there is no switching 
activity. 

It is important to observe, however, that as the CMOS technology shrinks (currently at 65 nm), leak- 
age currents grow, so static power consumption due to leakage tends to become a major factor in future 
technologies (45 nm and under). Static power due to leakage is illustrated in Figure 4.6(a), where the 
circuit is idle, so leakage is the only type of current that can occur. (Even though I). was represented as 
flowing only through the transistor drains, its composition is more complex, containing also contribu- 
tions from the gates.) 

Contrary to the static power, dynamic power is consumed by all sorts of digital circuits. This is par- 
ticularly relevant in CMOS circuits because it constitutes their main type of power consumption. 

The dynamic power of CMOS circuits can be divided into two parts, called short-circuit (Ps...) and 


capacitive (P..,) power consumptions, that is: 


Paynamic= short T Prag (4.4) 
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FIGURE 4.6. Static and dynamic power consumptions in a CMOS inverter: (a) Idle mode, so only leakage 
currents can occur (static power consumption); (b) Short-circuit (dynamic) power consumption, which occurs 
when one transistor is turned ON with the other still partially ON; (c) Capacitive (also dynamic) power con- 
sumption, needed to charge the capacitor during '0' > '1' output transitions; (d) Capacitor discharged to 
ground during '1' > '0' output transitions (this does not consume power from VDD). 
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The dynamic power consumption is illustrated in Figures 4.6(b)—-(d), where C, represents the total load 
capacitance (parasitic or not) at the output node. The case in Figure 4.6(b) corresponds to the moment 
when one transistor is being turned ON and the other is being turned OFF, so for a brief moment both 
are partially ON, causing a short-circuit current (Iy,,,,) to flow from Vpp to GND. The value of P.,,,. is 
obtained by integrating I,,,,,, and multiplying it by Vpp. Note that this occurs at both signal transitions 
(that is, '0' > 'l' and '1'> '0'). 

Figure 4.6(c) shows a '0'—>'1' transition at the output (after the nMOS transistor has been turned OFF 
and so only the pMOS transistor is ON). The capacitor is charged toward Vpp, causing the power P.,,, 
to be consumed from the power supply. The value of P,,,, can be obtained by integrating the product 
i(t):Vpp- 

Pap can be determined as follows. Say that a complete charge-discharge cycle occurs every T seconds 
(that is, with a frequency f=1/T). The amount of charge needed to charge a capacitor is Q=CV, where 
C is the capacitance and V is the final voltage, so in our case Qc, = Cy; Vpp. The power supplied by Vpp is 
determined by integrating the product v(t) - i(t) over the period T, where v(t) and i(f) are the voltage and 
current provided by the power supply. Doing so, the following results: 


T T T 


F Vop | - V, 
Pog }] vonte=4] vnittat=te toa“. GV =GV2,F (4.5) 


In the last step above it was considered that C, gets fully charged to Vpp. Also recall that P,,, relates to 
the energy consumed from the power supply to charge C;, not to the energy stored on C; (the latter is only 
50% of the former; the other 50% is dissipated in the transistor). f=1/T represents the frequency with 
which the '0' > '1' transitions occur at the output (also called switching frequency, sometimes represented 
by fy_,;). During the '1' > '0' output transitions no power is consumed from Vpp. 

In some cases, an “equivalent” (higher) C; is used (called C,,, in the equation that follows), which 
encompasses the short-circuit effect described above (Figure 4.6(b)). Consequently: 


Paynamic a Cleq Vet (4.6) 


However, when the actual load is large (C},=0.5pF, for example) or the switching signal’s transitions 
are very fast, the short-circuit power consumption tends to be negligible, so the actual value of C, can be 
employed in Equation 4.6. 


4.2.4 Power-Delay Product 


The power-delay (PD) product is an important measure of circuit performance. Equation 4.6 shows 
that the power consumption is proportional to Vpp”, hence it is important to reduce Vp to reduce the 
power. On the other hand, it will be shown in Chapter 9 (see Equations 9.8 and 9.9) that the transient 
response of a CMOS inverter is roughly proportional to 1/Vpp, so Vpp must be increased to improve 
the speed. 

Removing some of the simplifications used in Equations 9.8 and 9.9, an equation of the form kVpp/ 
(Vpp-V 7)? indeed results for the delay (where k is a constant and V; is the transistor’s threshold voltage). 
Multiplying it by Equation 4.6 to obtain the PD product, and then taking its derivative, an important 
conclusion results, which shows that PD is minimum when Vpp ~ 3Vr. 

This implies that to optimize the PD product Vpp cannot be reduced indefinitely, but V; must be 
reduced as well. This, however, brings other problems, like the reduction of the noise margin (studied in 
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Chapter 10). Typical values for Vpp and V7 in current (65 nm) CMOS technology are 1 V and 0.4 to 0.5 V, 
respectively. 


4.2.5 Logic Voltages 


Another important parameter of a specific logic architecture is its supply voltage (Vpp, introduced 
in Section 1.8), along with the allowed signal ranges. The first integrated CMOS family, called HC 
(Section 10.6), employed a nominal supply voltage of 5 V, borrowed from the TTL (transistor-transistor 
logic) family (Section 10.3), constructed with bipolar transistors, which preceded CMOS. 

Modern designs employ lower supply voltages, which reduce the power consumption (note in Equa- 
tion 4.6 that power is proportional to the square of the supply voltage). The complete set of LYCMOS 
(low-voltage CMOS) standards at the time of this writing is depicted in Figure 4.7, ranging from 3.3V 
(older) to 1V (newest). For example, current top-performance FPGA chips (described in Chapter 18) 
operate with Vpp=1 V (or even 0.9 V). 

Figure 4.7 also shows the allowed voltage ranges for each I/O. The vertical bar on the left represents 
gate output voltages, while the bar on the right corresponds to gate input voltages. For example, the 
'0' output from a 2.5 V LVCMOS gate is in the 0 V to 0.4V range, while a '0' input to a similar gate is 
required to fall in the 0V to 0.7V range. This gives a noise margin when low (NM,) of 0.3 V. Likewise, 
the '1' output produced by such a gate falls in the 2V to 2.5V range, while a '1' input is required to 
be in the 1.7V to 2.5 V range. Thus the noise margin when high (NM,;) is also 0.3 V. This signifies that 
any noise whose amplitude is lower than 0.3 V cannot corrupt the signals traveling from one gate to 
another in this family. 


4.2.6 Timing Diagrams for Combinational Circuits 


We conclude this section by reviewing the concept of timing diagrams (introduced in Section 1.10), which 
constitute a fundamental tool for graphically representing the behavior of digital circuits. More specifi- 
cally, they show the response of a circuit to a large-amplitude fast-varying stimulus (normally '0' > '1' 
and '1' > '0' transitions). 

Three timing diagram versions were introduced in Figure 1.13, and a more detailed plot was pre- 
sented in Figure 1.16. The plots of Figure 1.13, adapted to the inverter, are repeated in Figure 4.8 below. 

The option in Figure 4.8(a) is obviously the simplest and cleanest. However, it does not show time- 
related parameters, so it is normally referred to as functional timing diagram. The case in Figure 4.8(b) is 
a little more realistic because propagation delays are taken into account. The high-to-low propagation 
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FIGURE 4.7. Standard LVCMOS supply voltages and respective input-output voltage ranges. 
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FIGURE 4.8. Three timing diagram (transient response) versions for an inverter. 


delay is called t,44,, while the low-to-high is called t,1 4. Finally, the plots in Figure 4.8(c) depict the fact 
that the transitions are never perfectly vertical (though represented in a linearized form), with the time 
delays measured at the midpoint between the two logic voltages. In modern technologies, these propa- 
gation delays are in the subnanosecond range. 


MM EXAMPLE 4.1 BUFFER TIMING DIAGRAM 


Consider the buffer shown in Figure 4.9(a), to which the stimulus a depicted in the first waveform 
of Figure 4.9(b) is applied. Draw the waveforms for x and y in the following two situations: 


a. With a negligible time delay (hence this is a functional analysis similar to that in Figure 4.8(a)). 


b. Knowing that the propagation delays through the inverters that compose the buffer are 
toHL_invi = 1 ns, toLH_invl = 2ns, ESHL_inv2 = 3ns, and t pLH_inv2 = Ans, and that the time slots in 
Figure 4.9(c) are 1ns wide. To draw the waveforms, adopt the style seen in Figure 4.8(b). 


SOLUTION 
a. The resulting waveforms are included in Figure 4.9(b) with no delay between a, x, and y. 


b. The resulting waveforms are included in Figure 4.9(c). Gray shades were used to highlight the 
propagation delays (1, 2,3, and 4ns). Note that, as expected, the delay gets accumulated in y (5ns 
at both up and down transitions). 
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FIGURE 4.9. Buffer and corresponding timing diagrams of Example 4.1. Oo 


Note: The CMOS inverter analysis will continue in Sections 9.5 and 9.6 after we learn more about MOS 
transistors. Its construction, main parameters, DC response, and transition voltage will be presented in 
Section 9.5, while its transient response will be seen in Section 9.6. 


4.3 AND and NAND Gates 


Having seen how CMOS logic is constructed and how the CMOS inverter operates, we now begin 
describing the other fundamental gates, starting with the AND/NAND pair. 

Figure 4.10 shows AND and NAND gates, both with symbol, truth table, and respective CMOS imple- 
mentation. Contrary to what one might initially expect, NAND requires less hardware than AND (the 
latter is a complete NAND plus an inverter). 

As the name says (and the truth table shows), an AND gate produces a '1' at the output when a and b 
are high. Or, more generally, when all inputs are high. Therefore, its logical function can be represented 
by a “product”, as shown below, where some AND properties are also included. 

AND function: 


y=a-b (4.7) 
AND properties: 
a:0=0, a-1=a, a: a=a, a: a'=0 (4.8) 
The corresponding CMOS circuit is shown on the right of Figure 4.10(a), which consists of a NAND 
circuit plus a CMOS inverter. 
A NAND gate is shown in Figure 4.10(b), which produces the complemented version of the AND 
gate. 
NAND function: 
y=(a- by’ (4.9) 


From a hardware perspective, a simpler circuit results in this case because NAND is an inverting gate, 
and all transistors (as seen in Section 4.1) are inverters by nature. 
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FIGURE 4.10. (a) AND gate and (b) NAND gate (symbol, truth table, and CMOS circuit). 


The NAND circuit on the right of Figure 4.10(b) operates as follows. Suppose that a='1' (=Vpp) and 
b='0' (=0V). In this case, the bottom nMOS transistor is OFF (due to b='0') and the top one is ON (due 
to a='1'). Reciprocally, the left pMOS is OFF (due to a='1') and the right pMOS is ON (due to b='0'). 
Because the nMOS transistors are connected in series (two switches in series), it suffices to have one of 
them OFF (open) for the branch to be disconnected from GND. On the other hand, the pMOS transistors 
are associated in parallel (switches in parallel), so it suffices to have one of them ON (closed) for that 
section to be connected to Vpp. In summary, y is connected to Vpp (and disconnected from GND), hence 
resulting y='1' (= Vpp) at the output. Only when both nMOS transistors are ON (and consequently both 
pMOS are OFF) will the output be connected to GND. This situation requires a=b='1', so the circuit 
indeed computes the NAND function, y=(a-b)’. 


MM) EXAMPLE 4.2 TIMING DIAGRAM OF A COMBINATIONAL CIRCUIT 
Consider the circuit shown in Figure 4.11 (a). 
a. Write the expression for y. 


b. Suppose that the circuit is submitted to the stimuli a, b, and c depicted in the first three wave- 
forms of Figure 4.11(b), where every time slot is 2ns wide. Consider that the propagation delays 
through the AND and NAND gates are tf, anp=4ns and ft, nanp=3ns with the same value for 
the up and down transitions. Draw the corresponding waveforms for x and y, adopting the sim- 
plified timing diagram style of Figure 4.8(b). 


SOLUTION 
a. y=(x-c)'=(a-b-c)'. 


b. The resulting waveforms are included in Figure 4.11(b). Gray shades were used again to 
highlight the propagation delays (4ns and 3ns). Note that the delay sometimes gets accumu- 
lated (7ns). 
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FIGURE 4.11. Circuit and timing diagram of Example 4.2. O 


4.4 OR and NOR Gates 


The next pair of gates, called OR/NOR, is depicted in Figure 4.12, which includes their symbols, truth 
tables, and CMOS implementations. Again, NOR requires less hardware than OR (the latter is a com- 
plete NOR plus an inverter). 

As the name says (and the truth table shows), an OR gate produces a '1' at the output when a or b is 
high. Or, more generally, when any input is high. Therefore, its logical function can be represented by a 
“sum”, as shown below, where some OR properties are also included. 

OR function: 


y=at+b (4.10) 
Some OR properties: 
a+0=4, a+1=1, at+a=a, at+a’'=1 (4.11) 


The corresponding CMOS circuit is shown on the right of Figure 4.12(a), which consists of a NOR 
circuit plus a CMOS inverter. 

A NOR gate is shown in Figure 4.12(b), which produces the complemented version of the OR gate. 

NOR function: 


y=(a+b)’ (4.12) 


Here again a simpler circuit results than that for the OR function because NOR is also an inverting 
gate, and all transistors (as seen in Section 4.1) are inverters by nature. 

The NOR circuit on the right of Figure 4.12(b) operates as follows. Suppose that a='0' (=0V) and b='1' 
(=Vpp). In this case, the left nMOS transistor is ON (due to b='1') and the right one is OFF (due to a='0’). 
Reciprocally, the top pMOS is ON (due to a='0') and the bottom pMOS is OFF (due to b='1'). Because the 
pMOS transistors are connected in series (two switches in series), it suffices to have one of them OFF (open) 
for the branch to be disconnected from Vpp. On the other hand, the nMOS transistors are associated in parallel 
(switches in parallel), so it suffices to have one of them ON (closed) for that section to be connected to GND. In 
summaty, y is connected to GND (and disconnected from Vpp), hence resulting in y='0' (=0V) at the output. 
Only when both nMOS transistors are OFF (and consequently both pMOS are ON) will the output be con- 
nected to Vpp. This situation requires a=b='0', so the circuit indeed computes the NOR function, y=(a+b)'. 
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FIGURE 4.12. (a) OR gate and (b) NOR gate (symbol, truth table, and CMOS circuit). 


MM EXAMPLE 4.3 TWO-LAYER CIRCUIT 


Figure 4.13 shows two 2-layer (or 2-level) circuits. The circuit in Figure 4.13(a) is an AND-OR 
circuit because the first layer (also called input layer) consists of AND gates, while the second layer 
(also called output layer) is an OR gate. The reciprocal arrangement (that is, OR-AND) is shown in 
Figure 4.13(b). Find the expression for y in each one. 


SOLUTION 


Figure 4.13(a): The upper AND gate produces a:b, while the lower one produces c-d. These terms 
are then ORed by the output gate, resulting in y=a-b+c-d. Because the logic AND and OR opera- 
tions are represented by the mathematical product and addition symbols (though they have different 
meanings here), the equation looks like a regular sum of product terms, so this format is referred to 
as sum-of-products (SOP). 

Figure 4.13(b): The upper OR gate produces a+b, while the lower one produces c+d. These terms 
are then ANDed by the output gate, resulting in y=(a+b)-(c+d). For a reason analogous to that 
above, this format is referred to as product-of-sums (POS). 
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FIGURE 4.13. Two-layer (a) AND-OR and (b) OR-AND circuits of Example 4.3. 
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EXAMPLE 4.4 PSEUDO THREE-LAYER CIRCUIT 


Even though the circuit of Figure 4.14 looks like a 3-layer circuit, inverters are normally not consid- 
ered as constituting a layer. Indeed, they are considered as part of existing gates (for instance, recall 
that intrinsic inverters are needed to construct AND gates). The time delays that they cause, how- 
ever, are taken into consideration in the timing analysis. Write the expression for y in Figure 4.14. 


FIGURE 4.14. Pseudo three-layer circuit of Example 4.4. 


SOLUTION 


The AND gates produce (from top to bottom) a’ -b, b-c, and a-b'-c'. These terms are then ORed by 
the output gate, resulting y=a’-b+b-c+a-b'-c'. & 


4.5 XOR and XNOR Gates 


Figure 4.15 shows XOR and XNOR gates, both with symbol, truth table, and respective CMOS imple- 
mentation. For the XOR gate, an additional implementation, based on transmission gates (studied in 
Chapter 10), is also included. 

As the truth table in Figure 4.15(a) shows, the XOR gate produces a'1' when the inputs are different, or 
a'0' otherwise. More generally, the XOR gate implements the odd parity function, that is, produces y='1' 
when the number of inputs that are high is odd. This function is represented by the operator ©, that is, 
y=a@b, which has an equivalent representation using AND/OR operations as follows. 

XOR function: 


y=aQ@ b=a'- b+a-b' (4.13) 


This gate has several interesting properties (listed below), which are useful in the implementation of 
several circuits described later. 


a@®0=a (4.14a) 
a@1=a’ (4.14b) 


a@a=0 (4.14c) 
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y=a@b 


FIGURE 4.15. (a) XOR gate and (b) XNOR gate (symbol, truth table, and CMOS circuit). 


a@a@az=a (4.14d) 

a@a'=1 (4.14e) 
(a®b)'=a' @b=a@y' (4.14f) 
(a®@h@Oc=aGlbOa (4.14g) 
a-(b@®d=a-b@a-c (4.14h) 


A CMOS circuit for the XOR gate is shown on the right of Figure 4.15(a). Note that a, a’, b, and b’ are 
connected to nMOS and pMOS transistors in such a way that, when a=b, one of the nMOS branches is 
ON and both pMOS branches are OFF, hence connecting y to GND (y='0'). On the other hand, when 
a #b, both nMOS branches are OFF and one pMOS branch is ON, then connecting y to Vpp (y='1)). 

The same kind of information is contained in Figure 4.15(b), which presents an XNOR gate. As the 
truth table shows, the XNOR gate produces a '1' when the inputs are equal or a '0' otherwise. More gen- 
erally, the XNOR gate implements the even parity function, that is, produces y='1' when the number of 
inputs that are high is even. This function is represented by the complement of the XOR function, so the 
following results: 

XNOR function: 


y=(a®b)'=a'-b'+a-b (4.15) 


A CMOS circuit for the XNOR gate is also included in Figure 4.15(b), which is very similar to the 
XOR circuit. However, in the XNOR gate the output is 'l' when the a=b, whereas in the XOR gate it is '1' 
when a # b. 
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MM EXAMPLE 4.5 XOR PROPERTIES 


Prove Equations 4.14(a)-(c) above (the others are left to the exercises section). 


SOLUTION 


The proofs below employ Equation 4.13 plus AND/OR properties (Equations 4.8 and 4.11). 
For Equation 4.14(a): a®0=a'-0+a-0'=0+a-1=a 

For Equation 4.14(b): a@1=a’+1+a-1'=a'+0=a' 

For Equation 4.14(c):a@a=a'-a+a-a'=0+0=0. & 


4.6 Modulo-2 Adder 


We saw in Section 3.1 that the function implemented by a binary adder (also called modulo-2 adder) is 
the odd parity function. Therefore, because that is also the function implemented by the XOR gate, we 
conclude that XOR and modulo-2 adder are indeed the same circuit (recall, however, that in a binary 
adder additional circuitry is needed to produce the carry-out bit, so complete equivalence between binary 
adder and XOR only occurs when the carry-out bit is of no interest). This fact is illustrated in Figure 4.16 
for 2 and N inputs (two options are shown for the latter). 


MM EXAMPLE 4.6 N-BIT PARITY FUNCTION 


Consider the N-bit XOR gate depicted at the bottom right of Figure 4.16(b). Prove that it computes 
the odd parity function and that it is a modulo-2 adder (without carry-out, of course). 


SOLUTION 


Without loss of generality, we can reorganize the inputs as two sets, say 4, ...d,,, containing all inputs 
that are '1', and a,,,,...a,, containing all inputs that are '0'. Applying Equation 4.14(b) with a='1' 
successive times to the first set, the result is '0' when n is even or '1' when it is odd. Similarly, the 
application of Equation 4.14(a) with a='0' successive times to the second set produces '0' regardless 
of its number of inputs. Finally, applying Equation 4.14(a) to these to results, '0' or '1' is obtained, 
depending on whether n is even or odd, respectively. 


(a) (b) 


FIGURE 4.16. Modulo-2 adder (without carry-out) and XOR gate are the same circuit. 
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EXAMPLE 4.7 BINARY ADDITION VERSUS OR OPERATION 


Consider the following binary operations (even though quotes are omitted, these are binary numbers): 
(i) 1+1, (ii) 14+1+1, and (iii) 1+14+1+4+1. 


a. Find the results when '+' is performed by a binary adder. 


b. Find the results when '+' is performed by an OR gate. 


SOLUTION 
a. (i) 1+1=10 (decimal =2), (ii) 1+1+1=11 (decimal =3), (iii) 1+1+1+1=100 (decimal =4). 
b. (i) 1+1=1 (1 OR1=1), (Gi) 1+14+1=1 (1 OR1 OR 1=1), (iii) 14+1+1+1=1. B 


4.7 Buffer 


The last fundamental gates fall in the buffer category and are divided into three types: regular buffers 
(described in this section), tri-state buffers (Section 4.8), and open-drain buffers (Section 4.9). What they 
have in common is the fact that none of them performs any logical transformation (thus y =x) except for 
inversion sometimes. 
Aregular buffer is depicted in Figure 4.17(a). As shown in the truth table, this circuit does not perform any 
logical transformation (except for inversion in some cases), so its function can be represented as follows. 
Buffer function: 


y=xX (4.16) 


A typical implementation is also shown in Figure 4.17(a), which consists of cascaded. inverters. 
Depending on the number of inverters, the resulting buffer can be noninverting or inverting. 

Two typical applications for buffers are depicted in Figures 4.17(b) and (c). That in (b) serves to increase 
a gate’s current-driving capability, so the signal from one gate can be fed to a larger number of gates. 


Buffer 


am 
2 y x—|SoSo—-y 
@ x->- 


Buffer 


(c) 


FIGURE 4.17. (a) Regular buffer (symbol, truth table, and implementation); (b) Buffer used to increase current 
driving capacity; (c) Buffer employed to restore a weak signal. 


4.8 Tri-State Buffer 85 


That in (c) shows a buffer employed to restore a “weak” signal (in long-distance transmissions or noisy 
environments). Buffers with a large current capacity (normally many milliamperes) are also referred to as 
drivers (employed as line drivers and bus drivers, for example). 


4.8 Tri-State Buffer 


The circuit in Figure 4.18 is called a tri-state buffer (also three-state buffer or 3-state buffer) because it has 
a third state, called 'Z', which represents high impedance. In other words, when the circuit is enabled 
(ena='1'), it operates as a regular buffer, whereas when disabled (ena='0'), the output node is discon- 
nected from the internal circuitry. Its logical function is therefore that shown below. 

Tri-state buffer function: 


y= ena’ -Z+ena-x (4.17) 


Atypical CMOS circuit for this buffer is also included in Figure 4.18, which consists of aCMOS inverter 
followed by a C?7MOS inverter (C?MOS logic will be studied in Chapter 10). Note that when ena='1', both 
inner transistors are ON (because '1' is applied to the gate of the nMOS and '0' to the gate of the pMOS), 
so the outer transistors are connected to the output node (y), rendering a regular CMOS inverter. How- 
ever, when ena='0', both inner transistors are turned OFF (because now the nMOS receives '0' and the 
pMOS receives '1'), causing node y to be disconnected from all transistors (node y is left “floating”). 

The construction of multi-bit buffers is straightforward. As illustrated in Figure 4.19, an N-bit 
buffer consists simply of N single-bit units sharing the same buffer-enable (ena) signal. Two equivalent 


Tri-state buffer 
ena’-q 


[ena | y | 
x y a eae x y 
[ep ee] ena 4 
ena 


y= ena’. Z + ena-x 


FIGURE 4.18. Tri-state buffer (symbol, truth table, and CMOS-C*MOS implementation). 


ena J 


FIGURE 4.19. Construction of a multi-bit tri-state buffer (two equivalent representations are shown on 
the right). 
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representations are shown on the right of Figure 4.19, where the second one includes references to the 
bit indexes. Note that in both a thick line is used to indicate a multi-bit path, while for ena it is still a thin 
line (only one bit). 

A typical application for tri-state buffers is in the connection of multiple circuits to shared physical 
resources (like data buses, illustrated in the example below). 


MM EXAMPLE 4.8 TRI-STATE BUS DRIVERS 


Draw a diagram for a system that must connect 4 similar 16-bit circuits to a common 16-bit 
data bus. 


SOLUTION 


The solution diagram is shown in Figure 4.20 where 4 tri-state buffers of 16 bits each are employed. 
The number of bits in the thick lines is explicitly marked. In an actual system, a bus master (not 
shown in the figure) is needed to manage bus requests, generating proper access-enable (ena) 
signals. Obviously only one unit can hold the bus at a time while all the others remain disconnected 
from it (in the high-impedance state, 'Z'). 


Circuit Circuit Circuit Circuit 
1 2 3 4 
16 16 16 16 


FIGURE 4.20. Tri-state bus drivers of Example 4.8. Oo 


4.9  Open-Drain Buffer 


A final buffer is depicted in Figure 4.21. It is called open-drain buffer (or OD buffer) because it has at the 
output a MOS transistor (generally nMOS type) with its drain disconnected from any other circuit ele- 
ment. This point (drain) is wired to one of the IC’s pins, allowing external connections to be made as 
needed. A pull-up resistor (or some other type of load), connecting this point to Vpp, is required for 
proper circuit operation. 

Typical usages for OD buffers include the construction of wired AND/NOR gates and the provision 
of higher currents/voltages. Both cases are illustrated in Figure 4.21. In Figure 4.21(a), N open-drain buf- 
fers are wired together to form a NOR gate (note that it suffices to have one x high for y to be low), where 
Vpp» can be different from Vpp). In Figure 4.21(b), an external load, for example a 12 V/24mA mini-relay, 
thus requiring a larger current and a higher voltage than that usually provided by the corresponding 
logic family, is depicted. 


4.10 D-Type Flip-Flop 87 


Vop2 
Voo1 
isi Fg Routt-up 
IC 
4 
wired NOR 
12V 
x 3.3V 
2 air )24mA 
IC 
' =(X1+Xot...+Xn)’ 
Y=(X1#X2 Xn) ¥ 
TH ee 


FIGURE 4.21. 


FIGURE 4.22. Positive-edge triggered DFF (symbol and truth table). 


4.10 D-Type Flip-Flop 


Having finished the introduction of fundamental logic gates, we now describe the most fundamental 
logic register. 

Digital circuits can be classified into two large groups called combinational circuits and sequential cir- 
cuits. A circuit is combinational when its output depends solely on its current inputs. For example, all 
circuits discussed above are combinational. This type of circuit obviously does not require memory. In 
contrast, a circuit is sequential if its output is affected by previous system states, in which case storage ele- 
ments are necessary. A clock signal is then also needed to control the system evolution (to set the timing). 
Counters are good examples in this category because the next output depends on the present output. 

The most fundamental use for registers is in the implementation of the memory needed in sequential 
circuits. The type of register normally used in this case is called D-type flip-flop (DFF). 

Indeed, it will be shown in Chapter 13 that registers can be divided into two groups called latches 
and flip-flops. Latches are further divided into two types called SR and D latches. Similarly, flip-flops are 
divided into four types called SR, D, T, and JK flip-flops. However, for the reason mentioned above (con- 
struction of sequential circuits), the D-type flip-flop (DFF) is by far the most commonly used, covering 
almost all digital applications where registers are needed. 

The symbol and truth table for a positive-edge triggered DFF are shown in Figure 4.22. The circuit has 
two inputs called d (data) and clk (clock), and two outputs called q and q' (where q’ is the complement 
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of q). Its operation can be summarized as follows. Every time the clock changes from '0' to '1' (positive 
edge), the value of d is copied to q; during the rest of the time, q simply holds its value. In other words, 
the circuit is “transparent” at the moment when a positive edge occurs in the clock (represented by g* =d 
in the truth table, where q* indicates the circuit’s next state), and “opaque” at all other times (that is, 
q’=4). 

This type of circuit can be constructed in several ways, which will be described at length in Chapter 13. 
It can also be constructed with regular NAND gates as shown in Figure 4.23(a). This, however, is only for 
illustration because that is not how it is done in actual integrated circuits (though you might have seen it 
around). An example of actual implementation, borrowed from Figure 13.19, is displayed in Figure 4.23(b), 
which shows a DFF used in the Itanium microprocessor. 


Timing diagrams for sequential circuits 


Three timing diagram options were presented in Section 4.2 (Figure 4.8) and illustrated subsequently in 
Examples 4.1 and 4.2 for combinational circuits. We conclude this section with a similar analysis but now 
for sequential circuits (a DFF in this section, then larger DFF-based circuits in the sections that follow). 
The diagrams, however, are simplified, because they do not include certain time parameters, like setup 
and hold times (these will be seen in Chapter 13—see Figure 13.12). 

A typical timing diagram for a D-type flip-flop is shown in Figure 4.24, where the style of Figure 4.8(c) 
was adopted. As can be seen, there are two main parameters, both called t,cg (one down, the other up), 


clk’ clk 


Pulse generator 


FIGURE 4.23. DFF implemented with regular NAND gates (only for illustration because this is not how it 
is done in actual ICs). Actual implementations, like that in (b), which shows a DFF employed in the Itanium 
processor, will be described in Chapter 13. 


clk—> 


tpca_down toca up 


FIGURE 4.24. Simplified timing diagram for a DFF. 
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which represent the propagation delay from clk to g (that is, the time needed for g to change when the 
proper transition occurs in clk). The up and down delays are not necessarily equal, but in general the 
worst (largest) of them is taken as the DFF’s only t,cqg value. The usage of this parameter is illustrated in 
the example that follows. 


MM EXAMPLE 4.9 FREQUENCY DIVIDER 


Figure 4.25(a) shows a positive-edge triggered DFF with an inverted version of q connected to d. Using 
the clock as reference and the style of Figure 4.8(b), draw the waveform for the output signal, q, and 
compare its frequency, f,, to the clock’s, f,,. Assume that the propagation delays are t,-q=3ns for the 
DFF and t,, =1ns for the inverter. 


(a) (b) 


FIGURE 4.25. Frequency divider of Example 4.9. 


SOLUTION 


The solution is depicted in Figure 4.25(b), where the simplified timing diagram style of Figure 4.8(b) 
was employed. Because this circuit is a positive-edge DFF, arrows were marked in the clock wave- 
form to highlight the only points where the circuit is transparent. The DFF’s initial state was assumed 
to be q='0', so d='1', which is copied to g at the next positive clock edge, producing q='1' after 3ns. 
The new value of q now produces d='0' after 1ns. Then, at the next positive clock transition, d is 
again copied to q, this time producing q='0' and so on. Gray shades were used in the figure to high- 
light the propagation delays (1ns and 3ns). Comparing the waveforms of g and clk, we observe that 
f,=for/ 2, so this circuit is a divide-by-two frequency divider (typically used in the implementation 
of counters). Hf 


For now, what is important to understand is how a DFF operates, as well as the basic types of circuits 
that can be constructed with it, because these concepts will be needed in Chapters 6 and 7 (the rest can 
wait until we reach Chapter 13). For that purpose, three specially selected sections are included below, 
where the following DFF-based circuits are described: Shift registers (Section 4.11), counters (Section 4.12), 
and pseudo-random sequence generators (Section 4.13). 


4.11 Shift Register 


A common DFF application is shown in Figure 4.26(a), which shows a 4-stage shift register (SR). As will 
be further seen in Section 14.1, SRs are used for storing and/or delaying data. 
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qTG, q 
rst ; ; 


(b) 


FIGURE 4.26. (a) Four-stage shift register; (b) Simplified representation. 


The operation of a shift register is very simple: Each time a positive clock edge occurs, the data vector 
advances one position (assuming that it employs positive-edge DFFs). Hence in the case of Figure 4.26(a) 
each input bit (d) reaches the output (q) after 4 positive clock edges have occurred. Note that the DFFs in 
this case have an additional input, called rst (reset), which forces all DFF outputs to '0' when asserted. 

A simplified representation is also included in Figure 4.26(b), where the flip-flops are represented by 
little boxes without any reference to clock or reset. 


MM EXAMPLE 4.10 SHIFT REGISTER OPERATION 


Suppose that the signals rst, clk, and d shown in the first three waveforms of Figure 4.27 are applied 
to the 4-stage shift register of Figure 4.26(a). Draw the corresponding signals at all DFF outputs 
(91, Iz 93, and q4). Assume a fixed propagation delay t,cg # 0 for all DFFs. 


FIGURE 4.27. Timing diagram for the SR of Example 4.10. 


SOLUTION 


The solution is depicted in Figure 4.27, where d consists of a single pulse that travels along the SR 
as the clock ticks, reaching the output after 4 positive clock transitions. A fixed value was used for 
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4.12 Counters 


Like the other two sequential circuits introduced above (flip-flops and shift registers), counters will also 
be studied at length later (Chapter 14). However, because they are the most common of all sequential 
circuits (therefore the largest application for DFFs), an earlier introduction is deserved. 

Counters can be divided into two groups called synchronous and asynchronous. In the former, the clock 
signal is connected to the clock input of all flip-flops, whereas in the latter the output of one flip-flop 
serves as input (clock) to the next. 

Counters can also be divided into full-scale and partial-scale counters. The former is modulo-2, because 
it has 2‘ states (where N is the number of flip-flops), while the latter has M<2" states. For example, with 
N=4 DFFs, the largest counter has 2*=16 states (counting from 0 to 15, for example), which is then a 
modulo-2% counter. If the same four DFFs are employed to construct a decimal counter (counting from 0 
to 9), for example, thus with only 10 states, then it is a modulo-M counter (because 10 <16). Only the first 
category (modulo-2), which is simpler, will be seen in this introduction. 


Synchronous counter 


A popular architecture for synchronous counters is shown in Figure 4.28(a). This circuit contains one 
DFF plus two gates (AND +XOR) per stage. Owing to its modularity (all cells are alike), this circuit can 
be easily extended to any number of stages (thus implementing counters with any number of bits) by 
simply cascading additional standard cells. 

The corresponding timing diagram is depicted in Figure 4.28(b), where qq,q)="000" (decimal 0) is the 
initial state. After the first positive clock edge, the first DFF changes its state (as in Example 4.9), thus 
resulting in q.q,4)="001" (decimal 1). At the next clock edge, the first two DFFs change their state, thus 
now resulting in q44,4)="010" (decimal 2) and so on. In summary, qy changes at every 2°= 1 positive clock 


(a) 


(b) 


FIGURE 4.28. Synchronous counter: (a) Circuit; (6) Timing diagram. 
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FIGURE 4.29. Asynchronous counter. 


transitions, g, every 2! =2 clock transitions, qn every 2? =4 clock transitions, and so on, hence resulting in 
a sequential upward binary counter. 

Note that the simplicity of modulo-2 counters is due to the fact that they are self-resetting, that is, 
upon reaching the last state, 2’—1, the circuit automatically restarts from zero (see Figure 4.28(b)). 


Asynchronous counter 


The advantage of asynchronous counters is that they require a little less hardware (less silicon space) 
than their synchronous counterpart. On the other hand, they are a little slower. 

A modulo-2% sequential upward asynchronous counter is shown in Figure 4.29. This too is a 
modular structure, so it can be easily extended to any number of bits. Because in this case the 
output of one stage acts as clock to the next one, the clock only reaches the last stage after it propa- 
gates through all the others, and that is the reason why this circuit is slower than the synchronous 
version. 

Note also that each cell is similar to that in Example 4.9, that is, a divide-by-two circuit, so f,g=f,4,/2, 
fn=foo/ 2, fio=fn/2, etc., so the timing diagram for this circuit is similar to that in Figure 4.28(b) but with 
accumulated propagation delays. 


MM EXAMPLE 4.11 DOWNWARD ASYNCHRONOUS COUNTER 


Figure 4.30 shows the same asynchronous counter of Figure 4.29 but with g used as clock for the suc- 
ceeding stage instead of q’. Prove that this is a sequential downward counter. 


SOLUTION 


The timing diagram for this circuit is shown in Figure 4.30(b), which can be obtained as follows. 
Suppose that the present state is 44,4) ="000" (decimal 0), so d)=d, =dy='1'. Because at the next posi- 
tive clock edge dp is copied to qo, qg='1' then results. However, this upper transition of qo is a positive 
edge for the second stage, which then copies d, to q,, resulting in g,='1'. This is also a positive clock 
edge for the succeeding stage, so d, is copied to qp, resulting in q,='1'. In other words, qq,g)="111" 
(decimal 7) is the system’s next state. Similar reasoning leads to the conclusion that q2q,q)="110" 
(decimal 6) then follows, and so on, until qq4,q)="000" (decimal 0) is reached again. Note in the tim- 
ing diagram that the delay accumulates because each stage depends on information received from 
its preceding stages. 
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(a) 


(b) 


FIGURE 4.30. Downward asynchronous counter of Example 4.11. O 


rst 


(b) 


FIGURE 4.31. (a) Four-stage pseudo-random sequence generator; (b) Simplified representation. 


4.13 Pseudo-Random Sequence Generator 


A final sequential circuit (thus employing flip-flops) example is shown in Figure 4.31(a). This circuit is 
called linear-feedback shift register (LFSR) and implements a pseudo-random sequence generator. Other LFSR 
sizes will be seen in Section 14.7. 

The circuit in Figure 4.31 consists of a shift register (seen in Section 4.11) with two taps connected 
to an XOR gate whose output is fed back to the shift register’s input. This circuit is represented by the 
polynomial 1+x°+2x* because the taps are derived after the 3“ and 4" registers. 

The generated sequence (d) has a pseudo-random distribution, which is useful in communications 
and computer applications, like the construction of data scramblers (Section 14.8). 
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The shift register must start from a non-zero state, so the initialization can be done, for example, by 
presetting all flip-flops to '1'. Note in Figure 4.31 that the reset signal is connected to the preset (pre) input 
of all DFFs, which is the reciprocal of rst, that is, forces the outputs to '1' when asserted. 

A simplified representation is included in Figure 4.31(b), where again the DFFs are represented by 
little boxes, as in Figure 4.26(b), without any reference to clock or reset/preset. Note that in this case 
a modulo-2 adder was used instead of an XOR gate to reinforce the fact that they are indeed the same 
circuit (as seen in Section 4.6). 


MM EXAMPLE 4.12 PSEUDO-RANDOM SEQUENCE GENERATOR 


Consider the 4-bit LFSR-based pseudo-random sequence generator of Figure 4.31. Starting with 
"1111", list the corresponding sequence of values that it produces, and also verify that it indeed con- 
tains 2-1 distinct states. 


SOLUTION 


The solution is shown in Figure 4.32. Starting with 4 4q3g4="1111", one can easily observe that 
the next state is "0111", then "0011", and so on, until "1111" again occurs. The table shows 24-1 =15 
distinct values (states). 


91924344 
reset —> 1100 


FIGURE 4.32. Pseudo-random sequence generation of Example 4.12. | 


4.14 Exercises 


1. Static power consumption #1 


a. Consider the inverter shown in Figure 4.1. It was shown in Figure 4.1(c) that while x='1' the 
circuit consumes static power, so it is not adequate for digital systems. Assuming that R=10kO 
and that the actual output voltage is 0.1 V, with Vpp =5 V, calculate that power. 


b. Calculate the new power consumption in case Vpp were reduced to 3.3 V. 
2. Static power consumption #2 


a. Consider the inverter shown in Figure 4.2. It was shown in Figure 4.2(c) that while x='0' the 
circuit consumes static power, so it is not adequate for digital systems. Assuming that R=10kO 
and that the actual output voltage is 4.9 V, with Vpp =5V, calculate that power. 


b. Calculate the new power consumption in case Vpp were reduced to 3.3 V. 
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10. 


. Static power consumption #3 


a. Consider the CMOS inverter shown in Figure 4.5(a). Assuming that it does not exhibit any 
current leakage when in steady state, what is its static power consumption? 


b. Suppose that the circuit presents a leakage current from Vpp to GND of 1 pA while in steady 
state and biased with Vpp=3.3 V. Calculate the corresponding static power consumption. 


. Dynamic power consumption 


Suppose that the CMOS inverter of Figure 4.6(a) is biased with Vpp=3.3 V and feeds a load C; =1 pF. 
Calculate the dynamic power consumption when: 


a. There is no activity (that is, the circuit remains in the same state). 
b. The input signal is a square wave (Figure 1.14) with frequency 1 MHz and 50% duty cycle. 
c. The input signal is a square wave with frequency 1 MHz and 10% duty cycle. 


. Ideal versus nonideal transistors 


In the analysis of the CMOS inverter of Figure 4.5 it was assumed that the MOS transistors are 
ideal, so they can be represented by ideal switches. Suppose that instead they exhibit an internal 
resistance r;# 0, which should then be included in series with the switch in Figures 4.5(b)-(c). If the 
load connected to the output node (y) is purely capacitive, will r; affect the final voltage of y or only 
the time needed for that voltage to settle? Explain. 


. Noise margins 


With the help of Figure 4.7, determine the noise margins when low (NM,) and high (NM,) for the 
following logic families: 


a. 3.3V LVCMOS 
b. 1.8V LVCMOS 
ec. 1.2V LVCMOS 


. All-zero and all-one detectors 


a. Anaill-zero detector is a circuit that produces a '1' at the output when all input bits are low. Which, 
among all the gates studied in this chapter, is an all-zero detector? 


b. Similar to the case above, an all-one detector is a circuit that produces a 'l' at the output when all 
input bits are high. Which, among all the gates studied in this chapter, is an all-one detector? 


. Three-input CMOS NAND gate 


Draw a CMOS circuit for a 3-input NAND gate (recall that ina NAND gate nMOS transistors are 
connected in series and pMOS transistors are connected in parallel). 


. Four-input CMOS NOR gate 


Draw a CMOS circuit for a 4-input NOR gate (recall that in a NOR gate nMOS transistors are con- 
nected in parallel and pMOS transistors are connected in series). 


AND/OR circuit #1 


Using AND and/or OR gates with any number of inputs, draw circuits that implement the follow- 
ing functions: 


a. y=a-b-c-d 
b. y=atb+c+d 


96 CHAPTER 4 _ Introduction to Digital Circuits 


c. y=a-bt+c-d-e 


d. 


y=(a+b)-(c+d+e) 


11. AND/OR circuit #2 


Using only 2-input AND and/or OR gates, draw circuits that implement the functions below. 


a. 
b. 
c. 


d. 


y=a-b-c-d 
y=atb+ct+d 
y=a-b+c-d-e 
y=(a+b)-(c+d+e) 


12. NAND-AND timing analysis 


Suppose that the NAND-AND circuit of Figure E4.12 is submitted to the stimuli also included in the 
figure where every time slot is 10ns wide. Adopting the simplified timing diagram style of 
Figure 4.8(b), draw the corresponding waveforms at nodes x and y for the following two cases: 


a. Assuming that the propagation delays through the gates are negligible. 


b. Assuming that the propagation delays through the NAND and AND gates are 1ns and 2ns, 
respectively. 
FIGURE E4.12. 


13. OR-NOR timing analysis 


Suppose that the OR-NOR circuit of Figure E4.13 is submitted to the stimuli also included in 
the figure where every time slot is 10ns wide. Adopting the simplified timing diagram style of 
Figure 4.8(b), draw the corresponding waveforms at nodes x and y for the following two cases: 


FIGURE E4.13. 
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a. Assuming that the propagation delays through the gates are negligible. 


b. Assuming that the propagation delays through the OR and NOR gates are 2ns and Ins, 
respectively. 


14. NOR-OR timing analysis 


Suppose that the NOR-OR circuit of Figure E4.14 is submitted to the stimuli also included in 
the figure where every time slot is 10ns wide. Adopting the simplified timing diagram style of 
Figure 4.8(b), draw the corresponding waveforms at nodes x and y for the following two cases: 


a. Assuming that the propagation delays through the gates are negligible. 
b. Assuming that the propagation delays through the NOR and OR gates are 1ns and 2ns, respectively. 


FIGURE E4.14. 


15. NAND-only timing analysis 


Suppose that the NAND-only circuit of Figure E4.15 is submitted to the stimuli also included in 
the figure, where every time slot is 10ns wide. Adopting the simplified timing diagram style of 
Figure 4.8(b), draw the corresponding waveforms at nodes x and y for the following two cases: 


a. Assuming that the propagation delays through the gates are negligible. 
b. Assuming that the propagation delay through each NAND gate is Ins. 


a Vv 
b y 
c w 
d 
FIGURE E4.15. 


16. XOR properties #1 
Prove Equations 4.14(d)-(h) relative to the XOR gate. 
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17. XOR properties #2 


Show that: 
a. O@®NV@0...@0=0 


b. 
c. 
d. 


e. 


1@0@0...@0=1 
1@1@0@0...@0=0 
1@1@1@0@0...@0=1 
a@a'@®b@b'=0 


18. Equivalent XOR gate 


Using only inverters, AND and OR gates, draw a circuit that implements the 2-input XOR function. 
19. 3-input XOR gate 


Figure E4.19 shows a 3-input XOR gate. 


a. Write its truth table. 


Draw an equivalent circuit using only 2-input XOR gates. 


Draw an equivalent circuit using only NAND gates (with any number of inputs). 


b. Derive its Boolean expression. 
c. 
d. 

FIGURE E4.19. 


20. Modulo-2 addition 


Find the expression of y for each circuit in Figure E4.20. Note that these circuits are binary adders, 
which therefore compute the modulo-2 addition, but carry-out bits are not of interest in this case 
(hence they are just XOR gates). 


(d) 


FIGURE E4.20. 
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21. Truth table #1 
a. Write the truth table for the circuit in Figure E4.21 with the switch in position (1). 
b. Repeat the exercise with the switch in position (2). 


c. Based on your answers above, draw the simplest possible equivalent circuit. 


a 


FIGURE E4.21. 


22. Truth table #2 
Write the truth table for the circuit in Figure E4.22. (After solving it, see Figure 13.2.) 


a 


FIGURE E4.22. 


23. Combinational versus sequential 


Both circuits in the last two exercises have feedback loops, which are typical of sequential circuits 
(such circuits will be discussed in detail in Chapters 13-15; as already mentioned, a sequential 
circuit is one in which the outputs depend on previous system states). Which of these circuits is 
actually sequential? Comment. 


24. Bidirectional bus driver 


The circuit of Figure E4.24 must be connected to a bidirectional 8-bit bus. Comparing this situa- 
tion with that in Example 4.8, we observe that now the circuit must transmit and receive data. What 
type(s) of buffer(s) must be inserted in the region marked with a circle to construct the appropriate 
connections? Does RX need to go into high-impedance mode? Redraw Figure E4.24, including the 
proper buffer(s) in it. 


RX 


Circuit k 


FIGURE E4.24. 
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25. Open-drain buffer 


a. The circuit of Figure E4.25 shows a wired gate constructed with open-drain buffers (note that 
the internal buffers are inverters). What type of gate is this? 


b. When y is low (~0V), current flows through the 10k pull-up resistor. What is the power dis- 
sipated by this resistor when only one nMOS transistor is ON? And when all are ON? 


10kQ 


wired gate 


FIGURE E4.25. 


26. Flip-flop timing analysis #1 


Figure E4.26 shows a DFF to which the signals clk, rst, and d shown on the right are applied. Draw 
the complete waveform for q, assuming that all propagation delays are negligible and that the DFF’s 
initial state is q='0'. 


FIGURE E4.26. 


27. Flip-flop timing analysis #2 


Figure E4.27 shows the divide-by-2 circuit of Example 4.9 with some modifications introduced in 
the clock path. Draw the waveforms for d, q, and x using the clock waveform as reference. Assume 
that the propagation delays are negligible. 
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FIGURE E4.27. 


28. Shift register diagram 
a. Make a circuit diagram, as in Figure 4.26(a), for a 5-stage shift register. 


b. Say that the clock period is 50ns. What are the minimum and maximum time intervals that it 
can take for a bit presented to the input of this SR to reach the output? Assume that the propaga- 
tion delays are negligible. 


29. Shift register timing analysis 


Suppose that the signals clk, rst, and d depicted in Figure E4.29 are applied to the 4-stage shift regis- 
ter of Figure 4.26. Draw the corresponding waveforms at all circuit nodes (41, 42, q3, and q,). Assume 
that the propagation delays are negligible. 


FIGURE E4.29. 


30. Number of flip-flops 


a. What is the minimum number of flip-flops needed to construct the following counters: 
(i) O-to-99, (ii) 0-to—10,000. 


b. When constructing sequential upward counters with initial state zero, what is the largest count 
(decimal) value achievable with (i) 8 DFFs, (ii) 16 DFFs, (iii) 32 DFFs? 


31. Asynchronous counter timing analysis 


Draw the timing diagram for the asynchronous counter of Figure 4.29 and prove that it is indeed a 
sequential upward 0-to—7 counter with self-reset. 
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32. Pseudo-random sequence generator 


a. Starting with "1111" in the shift register, write the truth table for the LFSR of Figure 4.31 and 
check whether the results match those given in Figure 4.32. 


b. Repeat the exercise starting from a different value, say "1000", and check if “circularly” exactly 
the same sequence is produced. 


Boolean Algebra 


Objective: This chapter describes the mathematical formalities behind binary functions. It includes 
Boolean algebra and its theorems, followed by standard function representation formats and correspond- 
ing standard circuit implementations. The study of Karnaugh maps and other function-simplification 
techniques is also included, along with a discussion on timing diagrams and glitch generation in combi- 
national circuits. 


Chapter Contents 


5.1 Boolean Algebra 

5.2 Truth Tables 

5.3. Minterms and SOP Equations 

5.4 Maxterms and POS Equations 

5.5 Standard Circuits for SOP and POS Equations 
5.6 Karnaugh Maps 

5.7. Large Karnaugh Maps 

5.8 Other Function-Simplification Techniques 

5.9 Propagation Delay and Glitches 

5.10 Exercises 


5.1 Boolean Algebra 


Formal analysis of digital circuits is based on Boolean algebra, whose initial foundations were laid by 
G. Boole [Boole54] in the 1850s. It contains a set of mathematical rules that govern a two-valued (binary) 
system represented by zeros and ones. Such rules are discussed in this chapter, where several examples 
are also presented. To represent bit or bit vector values, VHDL syntax will again be employed whenever 
appropriate, which consists of a pair of single quotes for single bits or a pair of double quotes for bit 
vectors. 

A Boolean function is a mathematical function involving binary variables, logical addition (“+”", 
also called OR operation), logical multiplication (“-”, also called AND operation), and logical inver- 
sion (“'”). In summary, Boolean functions can be implemented using only the three fundamental 
gates depicted in Figure 5.1, which were introduced in Chapter 4. (More precisely, because there 
are equivalences between NOR and NAND gates, that is, (a+b)'=a'-b' and (a-b)’=a'+b', any 
Boolean function can be implemented using only one of these two gates, that is, only NOR or only 
NAND gates.) 
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Inverter OR AND 


a a 
a—PSo- yea ) ere iD 
b b 


FIGURE 5.1. The three fundamental operations employed in Boolean functions: Inversion, OR, and AND. 


Some examples of Boolean functions are shown below, where y represents the function and a, b, c, and 
d are binary variables. 


y=a-b 
y=a'-bt+a-c-d 
y=a-b-c+b'-d+a-c'-d 


The first expression above has only one term (a:b, called a product term because it involves logical 
multiplication between its components), which contains two literals (a and b), where a literal is a 
variable or its complement. The second expression has two product terms (a'-b and a-c-d) with two 
and three literals, respectively. And the third expression has three product terms (a-b-c, b’-d, and 
a-c'-d), with three, two, and three literals, respectively. Because in these examples the product terms 
are added to produce the final result, the expressions are said to be written in SOP (sum-of-products) 
format. 

Note in the expressions above that the dot used to represent the AND operation was not omitted 
(for example, a-b could have been written as ab), because in actual designs (see VHDL chapters, for 
example) signal names normally include several letters, so the absence of the dot could cause confusion. 
For example, ena: x could be confused with e-n-a-x. 

The fundamental rules of Boolean algebra are summarized below, where a, b, and c are again binary 
variables and f() is a binary function. All principles and theorems are first summarized, then a series of 
examples are given. The proofs are straightforward, so most are left to the exercises section. 


Properties involving the OR function: 


a+0=a (5.1a) 

a+1=1 (5.1b) 

at+az=a (5.1¢) 

at+d=1 (5.1d) 

a+b=b+ a (commutative) (5.1e) 

(a+ b)+c=a+(b+0=a+b+ c (associative) (5.1f) 

Properties involving the AND function: 

a-l=a (5.2a) 

a-0=0 (5.2b) 


a:a=a (5.2c) 
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a-a’=0 (5.2d) 
a- b= b- a (commutative) (5.2e) 
(a: b)- c=a-(b- Od (associative) (5.2f) 
a:(b+Q=a- b+a-c (distributive) (5.2g) 
Absorption theorem: 
at+a:b=a (5.3a) 
at+a’:-b=a-b'+b=a+b (5.3b) 
a:‘b+a:-b’=a (5.3c) 
Consensus theorem: 
a:-b+b-c+a':c=a:b+a':c (5.4a) 
(a+b): (b+ 0: (a'+Q=(a+b)-(a'+0 (5.4b) 
Shannon’s theorem: 
f(a, b, c...)=a'- f(0, 6 G...)+a-f(1, BG...) (5.5a) 
f(a, b, c...)=[a+ (0, b, c...)] -[a’+f(1, B, c...)] (5.5b) 
DeMorgan’s law: 
(a+ b4+c+...)'=a'-b'-c.... (5.6a) 
(a-b-c:...)'=a' +b’ +c 4... (5.6b) 


Principle of duality: 
Any Boolean function remains unchanged when '0's and '1's as well as "+" and "-" are swapped (this 
is a generalization of DeMorgan’s law). 


f(a, b,c +,-)=f' (a’, Bb’, C+, 4) (5.7) 


Common-term theorem: 
Suppose that y=f() is an N-variable Boolean function with a common-term a. If the common-term 
occurs when y is expressed as a sum of products (SOP), then the following holds: 


y=a-b,+a-b)+...+a.by =a: (b, +b) +...+by) (5.8a) 


Likewise, if the common-term occurs when y is expressed as a product of sums (POS), then the 
following is true: 


y=(a+b,)-(a+b,)...(a+b,)= a+b, by... by (5.8b) 


As mentioned earlier, the proofs for all of the theorems above are straightforward, so most are left to 
the exercises section (except for one proof, given in the example below). Several examples, illustrating 
the use of these theorems, follow. 
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Ml EXAMPLE 5.1 ABSORPTION THEOREM 
Prove Equation 5.3(b) of the absorption theorem. 


SOLUTION 

We can write a=a-(b+b')=a-b+a-b'. We can also duplicate any term without affecting the result 
(because a+a=a). Therefore, a+a'-b=a-b+a-b'+a'-b=a-bt+a-b'+a'-b+a-b=a-(b+b')+(a' +a)-b=a+b. 
EXAMPLE 5.2 SHANNON’S THEOREM 

Apply both parts of Shannon’s theorem to the function y=a'+b-c and check their validity. 


SOLUTION 


a. Note that f(0, b, c)=1+b-c=1 and f(1, b, c)=0+b-c=b-c. Therefore, y=f(a, b, c)=a'-f(0, b, c)+ 
a-f(1,b, c)=a'-1+a-b-c=a'+a-b-c=a'+b-c (in the last step, the absorption theorem was applied). 


b. y=f(a, b,c)=[a+f(0,b,c)]-[a'+f(1, b, c)]=(a+1)-(a'+b-c)=a'+b-c. 


EXAMPLE 5.3 DEMORGAN’S LAW 
Using DeMorgan’s law, simplify the following functions: 


a. y=(a'+b'-c’) 

b. y=[(a+b)-c+(b-c')']' 

SOLUTION 

a. y=(a'+b'-c')'=(a')'-(b'-c')'=a-(b+c) 

b. y=[(a+b)-c+(b-c’)']'=[(at+b)-c+(b'+c)]'=[(a+b)-c]’-(b' +c)'=[(a+b)'+c']-b-c'=(a'-b'+c')- 
b-c'=b-c'. 

EXAMPLE 5.4 COMMON-TERM ANALYSIS 

If y=ay°Ay°d3...4,,+b,*b2°b3...b,, prove that y= N(@;+b)). 

SOLUTION 

T1(a; + Bj) =[@) + by) * Gy +2). --(4 + 8 ,)]- -- [Gin + 91) * Gn + 02). - (Gn. + 0,)]. For each expression between 

brackets, Equation 5.8(b) can be applied, yielding [a,+b,-b...0,]... [a+b ,:b...b,]. Reapplying 

Equation 5.8(b) for the common term Dy ° D9... .Dy, Y=Ay* Ay ° 3... Ay, +01 +b °b3...b,, results. 


EXAMPLE 5.5 PRINCIPLE OF DUALITY #1 
Apply the principle of duality to the function y=a’' +b-c and check its validity. 


SOLUTION 


First, it is important to place parentheses around the operations that have precedence (that is, 
“.”) because they will be replaced with “+” but must keep the precedence. Therefore, the dual 
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of y=a'+(b-c) is y’=a-(b'+c'). Using DeMorgan's law, the latter can be manipulated as follows: 
(y’)'=[a-(b' +c')]'=a' +(b'+c’)'=a' +b-c. Hence the dual functions are indeed alike. 


EXAMPLE 5.6 PRINCIPLE OF DUALITY #2 
Apply the principle of duality to the gates shown in Figure 5.2(a). 


(a) (b) 


FIGURE 5.2. Duality principle applied to basic gates (Example 5.6). 


SOLUTION 


The solution is depicted in Figure 5.2(b). Note that '0'<'1' (thatis,a<oa') and AND © OR were swapped. 
Two solutions are shown, one using ticks (’ ) to represent inversion, the other using bubbles. 


EXAMPLE 5.7 CIRCUIT SIMPLIFICATION 


Prove that the circuits shown in Figure 5.3 are equivalent. 


oC 
aod oo 


=_ 


FIGURE 5.3. Circuits of Example 5.7. 


SOLUTION 

The Boolean function for the circuit on the left is y=[a'+(b-c)’+d'+(a-d)']’. Using DeMorgan’s law, 
the following results: y=(a'+b'+c’+d'+a'+d')'=(a'+b'+c'+d')'=a-b-c-d. Thus the circuits are 
indeed equivalent. 
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5.2 Truth Tables 


As already seen in Chapter 4, truth tables are also a fundamental tool for digital circuit analysis. A truth 
table is simply a numeric representation of a Boolean function; therefore, for an N-variable function, QN 
rows are needed. 

Any truth table will fall in one of the four cases depicted in Figure 5.4, all for N=3 (thus with 8 entries). 
The function shown in the truth table of Figure 5.4(a) is y=a'+b-c. Note that the first column contains 
all possible input (variable) values, while the second contains the output (function) values. Observe also 
that the variable values are organized in increasing decimal order, that is, from "000" (decimal 0) to "111" 
(decimal 7). 

Another function is depicted in Figure 5.4(b), where a value different from '0' or '1', represented by 'X', 
can be observed. The 'X's indicate don’t care values, meaning values that either will not be used or are not 
relevant, so they can be chosen at will by the circuit designer (a common choice because that minimizes 
the hardware). 

A third case is shown in Figure 5.4(c), which contains an incomplete truth table. This should always 
be avoided because when using an automated tool (like a VHDL synthesizer) to infer the hardware, the 
compiler might understand that the previous system state must be held, in which case registers will be 
inferred (thus wasting resources). 

Finally, a “degenerate” truth table is depicted in Figure 5.4(d), where less than 2‘ rows are employed. 
Because not all entries (variable values) are explicitly provided, the rows cannot be organized in adja- 
cently increasing order. This type of table is normally used only when there are very few zeros or ones, 
so a compact representation results. 


5.3  Minterms and SOP Equations 


There are two standard formats for Boolean functions, which are called SOP (sum-of-products) and POS 
(product-of-sums). The former is described in this section, while the latter is seen in the next. 

For a Boolean function of N variables, a minterm is any product term containing N literals (recall 
that a literal is a variable or its complement). For example, a'-b'-c’, a-b’-c', and a-b-c are examples 
of minterms for f(a, b, c). Therefore, looking at truth tables (Figure 5.4), we conclude that each entry is 
indeed a minterm. 


abc y 

000 0 
001 [| 101 o | 
010 | aa o | 
071 | others ee 
100 (d) 


101 
110 
(a) saa 


=-|oO 


(c) 


x< 


FIGURE 5.4. (a) Regular truth table; (b) Truth table with “don’t care” states; (c) Incomplete truth table; 
(d) Degenerate truth table (<2" rows). 


5.3 Minterms and SOP Equations 109 


Minterm Maxterm abc 
Mo= a"b"-c’ My = at+b+c 000 
m;=a"b"c M; = atbtc’ 001 
M2= a’b-c’ M2 = atb'+c 010 
Ma= abc M3 = atb'+c’ 011 
M4 = a-b’-c’ M,= a’+b+c 100 
ms= a-b’-c Ms; = a'+b+c’ 101 
Ms = abc’ M; = a’+b’+c 110 
m;=arb-c M; = a’+b’+c’ 111 


FIGURE 5.5. Truth table for y=a’-b-c'+a-b-c’+a-b-c (minterm expansion). 


Let us consider the function y =f (a, b, c) given below: 
y=a'-b-C+a:b-Ct+a-b-c (5.9a) 


Because all terms in Equation 5.9(a) are minterms, this expression is referred to as minterm expansion. 
Moreover, because it is also a sum of products, it is called SOP equation (SOP format). 

The truth table for Equation 5.9(a) is presented in Figure 5.5 with the corresponding minterms also 
included (represented by m;, where i is the decimal value represented by abc in that row of the truth 
table). Note that y='1' occurs only for the minterms that appear in Equation 5.9(a), so another representa- 
tion for that equation is the following: 


y=M),+Mg+m, (5.9b) 
Or, equivalently: 
y=>m/(2, 6, 7) (5.9c) 


Every term of an SOP is called an implicant because when that term is '1' the function is '1' too. There- 
fore, all minterms that cause y='l' are implicants of y. However, not all minterms are prime implicants. 

A prime implicant is an implicant from which no literal can be removed. To illustrate this concept, let 
us examine Equation 5.9(a) again. That function can be simplified to the following (simplification tech- 
niques will be discussed ahead): 


y=a-b+b-c' (5.9d) 


Equation 5.9(d) has two terms from which no literal can be further removed, so a:b and b-c’ are prime 
implicants of y. Note that in this particular example neither prime implicant is a minterm. 


MM EXAMPLE 5.8 MINTERMS AND PRIME IMPLICANTS 


Consider the irreducible function of four variables y=a'-c'+a'-b’-d'+a-b-c-d. Which terms are 
minterms and which are prime implicants? 


SOLUTION 


Because the equation is irreducible, all three terms are prime implicants. However, only one of them 
has four literals, that is, only a-b-c-d is a minterm (=115). 
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EXAMPLE 5.9 MINTERM EXPANSION AND IRREDUCIBLE SOP 


a. For the function of Figure 5.6(a), write the minterm expansion and obtain the corresponding 
irreducible SOP. Which prime implicants are minterms? 


b. Repeat the exercise for the function of Figure 5.6(b). 


TWinterm [abe | y 


winter [ab | y_] 


3 
" 
© 
cs 
° 
om 
o}|-=|-|/o0 


(b) 


FIGURE 5.6. Truth tables of Example 5.9. (a) Minterm expansion: y=m,+m,=a':b+a-b’. (b) Minterm 
expansion: y=M,+M3+M4+M7=a'-b-c'+a'-b-ct+a-b'-c'+a-b<c. 


SOLUTION 


Part (a): 
y=m,+m,=a'-b+a-b'~ 
Simplified expression: This is indeed the XOR function (compare the truth table of Figure 5.6(a) with 


that in Figure 4.15(a)), already in irreducible SOP format. Both prime implicants are minterms. 


Part (b): 
Y=My+M3+My+M7=a'+b-c'+a'-b-c+a-b'-c'+a-b-c. 


Simplified expression: In the first and second terms, a’ -b can be factored, while b-c can be factored 
in the second and fourth terms. Because any term can be duplicated without affecting the result, so 
can the second term. Therefore: 


, 


y=a'-b-c'+a'-b-c+a'-b-cta-b-ct+a-b'-c 
=a'-b-(c'+c)+(a'+a)-b-ct+a-b'-c' 


=a'-b+b-ct+a:b'-c 


In this case, only the last prime implicant is a minterm (=m,). 


5.4 Maxterms and POS Equations 


We describe now the other standard function format, called POS (product-of-sums). 
For a Boolean function of N variables, a maxterm is any sum term containing N literals. For instance, 
a'+b'+c',a+b'+c',anda+b+c are examples of maxterms for f(a, D, c). 
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Let us consider again the function y =f (a, b, c) given by Equation 5.9(a) and numerically displayed in 
the truth table of Figure 5.5. Instead of using the minterms for which y='1', we can take those for which 
y='0' and then complement the result, that is: 


Y=(Myt+m,+Mm3+M,+Ms)' 
=(a'-b'-c'+a'-b'-c+a'-b-c+a-b'-c'+a-b':c)' 
=(at+b+c):(a+bt+c'):(a+b'+c'):(a'+bt+c):(a'+b+t+c’) 
The expression derived above, 
y=(a+b+0-(a+b+c)-(a+b' +c’): (a'+b+d-(a'+b+c’) (5.10a) 


contains only maxterms, hence it is referred to as a maxterm expansion. Moreover, because it is also a 
product-of-sums, it is called POS equation (POS format). 

The truth table for Equation 5.10(a), which is the same as that for Equation 5.9(a), is shown in Figure 5.5, 
where the maxterms are also included (represented by M;, where 7 is the decimal value represented by abc 
in that row of the truth table). 

Two other representations for Equation 5.10(a) are shown below. 


y=M,- M,-M;-M,° Mg (5.10b) 
y=IIM(0, 1, 3, 4, 5) (5.10) 


Minterm-Maxterm relationship 


Suppose that y=f() is a binary function of N variables. Without loss of generality, we can assume that 
f() is'1' for all minterms from 0 to n—1 and '0' for the others (n to N). Thus we can write: 


y=>m(0, 1,..., 2-1)=[ m(n, n+1,..., N)]/=IM(n, n+1,..., N) (5.11) 


where m and M represent minterms and maxterms, respectively, which are related by the following 
equation: 


M= mj’ (5.12) 


MM EXAMPLE 5.10 MAXTERM EXPANSION AND IRREDUCIBLE POS 


The truth tables of Figure 5.7 are the same as those of Figure 5.6 but with maxterms included in the 
extra column instead of minterms. 


a. For the function of Figure 5.7(a), write the maxterm expansion, show that the result is equivalent 
to that in Example 5.9, and obtain the corresponding irreducible POS. 


b. Repeat the exercise for the function of Figure 5.7(b). 


SOLUTION 


Part (a): 
y=My:M3=(atb)-(a' +b’) 


Proof of equivalence: Multiplying the terms in the expression above, (a+b) -(a'+b')=a-b'+a'-b 
results, which is the same expression obtained in Example 5.9. 
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Irreducible POS expression: As mentioned in Example 5.9, this is the equation of a 2-input XOR gate 
with y=(a+b) -(a'+b’) already in irreducible POS form. 


-waxtem [abe | y_] 


waren [ab | y 
PMm=aeo | 00 | 0 | 


M,= a'+b 10 1 
M; = a'+b’ 11 0 
(a) 


o|< 


-|o 


=a] os 


Mz = atb’+c’ 011 
(b) 


o|o 


ik 


FIGURE 5.7. Truth tables of Example 5.10 (similar to those in Example 5.9). (a): Maxterm expansion: y=M,: M3 
=(a+b)-(a'+b’). (6): Maxterm expansion: y=My):M,:-M,-M,=(at+b+0):(a+b+c')-(a'+b+c')-(a’+b' +0). 


Part (b): 

y=M):'M,:M;5-Mg=(a+b+c):(a+b+c’)-(a'+b+c'):(a'+b' +c) 

Proof of equivalence: In this case, the multiplication of the terms leads potentially to 3*=81 terms, 
so another approach should be adopted. Equation 5.11 can be applied, that is, y=M):-M,:-M;-M,= 
My +M3+M4+M,, which is therefore the same equation derived in Example 5.9. 

Irreducible POS expression: A simple solution is to double-invert the SOP equation (from 
Example 5.9), that is, y=[(a'-b+b-c+a-b'-c')']'=[(a+b’)-(b'+c’)-(a'+b+0o)]’. 


5.5 Standard Circuits for SOP and POS Equations 


As seen above, logic expressions can be represented in two standard formats called SOP and POS. For 
example: 

SOP: y,=a-b+c-dte-f 

POS: yy =(a+b)-(c+d)-(e+f) 


Standard circuits for SOP equations 

Any SOP can be immediately implemented using AND gates in the first (product) layer and an OR 
gate in the second (sum) layer, as shown in Figure 5.8(a) for y=a-b+c-d. This architecture is equiv- 
alent to one that employs only NAND gates in both layers. To demonstrate this, in Figure 5.8(b), 


a a a 
b b b 

yy —S y ==> y 
c c c 
d d d 


(a) (b) (c) 


FIGURE 5.8. Principle of duality employed to convert the AND-OR circuit in (a) to a NAND-only circuit in (c). 
These circuits implement the SOP equation y=a-b+c-d. 
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FIGURE 5.9. Principle of duality employed to convert the OR-AND circuit in (a) to a NOR-only circuit in (c). 
These circuits implement the POS equation y=(a+b)-(c+d ). 


bubbles were inserted at both ends of the wires that interconnect the two layers (thus not affecting 
the result). Subsequently, the duality principle was applied to the gate in the second layer, resulting in 
the NAND-only circuit of Figure 5.8(c). Recall from Chapter 4 that 2-input NAND/NOR gates require 
only 4 transistors, while 2-input AND/OR gates require 6, so the circuit in Figure 5.8(c) requires less 
hardware than that in Figure 5.8(a). 


Standard circuits for POS equations 


Any POS can be immediately implemented using OR gates in the first (sum) layer and an AND gate in 
the second (product) layer as shown in Figure 5.9(a) for y=(a+b)-(c+d). This architecture is equivalent 
to one that employs only NOR gates in both layers. To demonstrate this, in Figure 5.9(b) bubbles were 
inserted at both ends of the wires that interconnect the two layers (thus not affecting the result), then 
the duality principle was applied to the gate in the second layer, resulting in the NOR-only circuit of 
Figure 5.9(c). Similarly to what occurred in Figure 5.8, the circuit in Figure 5.9(c) also requires less hard- 
ware than that in Figure 5.9(a). 


Ml EXAMPLE 5.11 STANDARD SOP AND POS CIRCUITS 
a. Using the standard NAND-only SOP approach of Figure 5.8(c), draw a circuit that implements 
yy=atb-c+d-e-f. 
b. Using the standard NOR-only POS approach of Figure 5.9(c), draw a circuit that implements 
Yo=a-(b+c):(dt+etf). 
SOLUTION 


Both circuits are depicted in Figure 5.10. Note that a one-input NAND or NOR is indeed an 
inverter. 
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FIGURE 5.10. Standard SOP and POS circuits for (a) y,=a+b-c+d-e-f and (b) y>=a-(b+c)-(dt+e+f). 
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EXAMPLE 5.12 STANDARD SOP CIRCUIT 


Suppose that we want to draw a circuit that implements the trivial function y=a-b-c, which only 
requires a 3-input AND gate. This function, having only one term, can be considered either an SOP 
or a POS equation. Adopting the SOP case, the implementation then requires a 2-layer NAND-only 
circuit like that in Figure 5.8(c). Show that, even when such an approach is adopted, the circuit even- 
tually gets reduced to a single 3-input AND gate. 


SOLUTION 


The solution is depicted in Figure 5.11. Being an SOP, y=a-b-c can be written as y=a-b-c+ 
0+0+... Therefore, as shown in Figure 5.11(a), the first NAND receives abc, whereas all the others 
get "000" (only one additional gate is depicted). The output of the all-zero NAND is '1', shown in 
Figure 5.11(b). A 2-input NAND gate with a 'l' in one input is simply an inverter, depicted in 
Figure 5.11(c), which annuls the bubble at the NAND output, hence resulting in the expected 
3-input AND gate of Figure 5.11(d). 


a 
b a 
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yY <> ye 6 y <> 6 y 
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FIGURE 5.11. SOP-based implementation of y=a-b-c, which, as expected, gets reduced to a simple 
AND gate. 


EXAMPLE 5.13 STANDARD POS CIRCUIT 


Suppose that we want to draw a circuit that implements the same trivial function y=a-b-c seen 
above, this time using the POS-based approach, in which a 2-layer NOR-only circuit like that in 
Figure 5.9(c) is required. Show that, even when such an approach is adopted, the circuit eventually 
gets reduced to a single 3-input AND gate. 


SOLUTION 


The solution is depicted in Figure 5.12. Being a POS, y=a-b-c canbe written as y=(a+0)-(b+0)-(c+0). 
Therefore, as shown in Figure 5.12(a), each NOR gate receives one variable and one '0'. Recall that 
a 2-input NOR gate with a '0' in one input is simply an inverter, as depicted in Figure 5.12(b). The 
inverters were replaced with bubbles in Figure 5.12(c), and then the duality principle was applied, 
converting the NOR gate with bubbles on both sides into the expected 3-input AND gate shown in 
Figure 5.12(d). 
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FIGURE 5.12. POS-based implementation of y=a-b-c, which, as expected, gets again reduced to a simple 
AND gate. 
EXAMPLE 5.14 NAND-ONLY CIRCUIT #1 


It is very common in actual designs to have gates with a number of inputs (fan-in) that do not 
directly match the design needs. This kind of situation is examined in this and in the next example. 
Draw a circuit that implements the equation y=a-b-c using only NAND gates. Show two 
solutions: 

a. Using only 3-input gates. 


b. Using only 2-input gates. 


SOLUTION 


Part (a): 
The circuit is shown in Figure 5.13(a). In Figure 5.13(a1), the intended circuit is shown, which is a 
3-input AND gate. However, because only NAND gates are available, in Figure 5.13(a2) bubbles are 


aca a 
ee ae 
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FIGURE 5.13. Implementation of y=a-b-c using only (a) 3-input and (b) 2-input NAND gates. Inverter imple- 
mentations are shown in (c) and (d). 
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inserted at both ends of the output wire (one at the AND output, which then becomes a NAND, and 
the other before y), thus not affecting the result. In Figure 5.13(a3), the output bubble is replaced with 
an inverter, which can be constructed as shown in Figure 5.13(d). 


Part (b): 

The circuit is shown in Figure 5.13(b). In Figure 5.13(b1), the intended circuit is shown, which requires 
two 2-input AND gates. However, because only NAND gates are available, in Figure 5.13(b2) bubbles 
are inserted at both ends of the wires that depart from AND gates, thus converting those gates into 
NAND gates without affecting the result. In Figure 5.13(b3), the undesired (but inevitable) bubbles 
are replaced with inverters, constructed according with Figure 5.13(c). 


EXAMPLE 5.15 NAND-ONLY CIRCUIT #2 


Similarly to the example above, draw a circuit that implements the equation y=a-b+c-d-e using 
only NAND gates. Show two solutions: 


a. Using only 3-input gates. 
b. Using only 2-input gates. 


SOLUTION 


Part (a): 

The circuit is shown in Figure 5.14(a). In Figure 5.14(a1), the conventional AND-OR circuit for 
SOP implementations is shown (similar to Figure 5.8(a)). However, because only NAND gates 
are available, in Figure 5.14(a2) bubbles are inserted at both ends of all wires that depart from 
AND gates, thus converting them into NANDs without affecting the result. In Figure 5.14(a3), 
the duality principle is applied to the OR gate, resulting the NAND-only circuit shown in Figure 
5.14(a3). (This sequence had already been seen in Figure 5.8, so the standard NAND-only circuit 
could have been drawn directly.) 
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FIGURE 5.14. Implementation of y=a-b+c-d-e using only (a) 3-input and (b) 2-input NAND gates. The 
inverter is similar to that in Figure 5.13(c). 
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Part (b): 

The circuit is shown in Figure 5.14(b). In Figure 5.14(b1), the traditional AND-OR circuit is 
shown, which requires two 2-input AND gates to implement the term c-d-e. However, because 
only NAND gates are available, in Figure 5.14(b2) bubbles are inserted at both ends of the 
wires that depart from AND gates, thus again converting them into NAND gates. In Figure 
5.14(b3), the duality principle is applied to the OR gate, converting it into a NAND gate, while 
the undesired (but inevitable) bubble is replaced with an inverter, constructed according with 
Figure 5.13(c). i 


5.6 Karnaugh Maps 


We saw in previous sections that many times a Boolean function allows simplifications, which are desir- 
able to save hardware resources, often leading to faster circuits and lower power consumption. 

The simplification of a binary function can be done basically in four main ways: (i) analytically, 
(ii) using Karnaugh maps, (iii) using the Quine-McCluskey algorithm, or (iv) using heuristic quasi- 
minimum methods. The first two are by-hand procedures, while the others allow systematic, computer- 
based implementations. 

Simplification techniques are normally based on part (c) of the absorption theorem, that is,a-b+a-b' =a, 
which is applied to the Boolean function expressed in SOP format. The main problem in the analytical 
approach (besides not being appropriate for computer-based simplification) is that it is very difficult to 
know when an irreducible (minimum) expression has been reached. For example, consider the function 
y=f (a, b, c) below: 


Y=M+m+Mo+m=a'-b'-c+a'-b-C+a-b-c+a-bec (5.13a) 
We can apply the absorption theorem to the pairs of terms 1-2, 2-3, and 3-4, resulting in the following: 
y=a'-C(b'+b)+b-c(a’+a)ta bl’ +q=a'-c+b-c+a-b (5.13b) 

However, we could also have applied it only to the pairs 1-2 and 34, that is: 
y=a'-c'(b'+b)+a-D(c'+d=a'-c+a-b (5.13) 


We know that the expressions above are equivalent, but Equation 5.13(c) is simpler than 
Equation 5.13(b). Moreover, it is not immediately obvious by looking at Equation 5.13(b) that an extra 
step can be taken (the absorption theorem can no longer be applied anyway; the consensus theo- 
rem would now be needed). For that reason, the Karnaugh approach is preferred when developing 
by-hand optimization. 

A Karnaugh map is another way of representing the truth table of a Boolean function with the main 
purpose of easing its simplification. Figure 5.15(a) shows the general form of a Karnaugh map for a 
function f(a, b, c, d) of four variables. Note that the values for the pairs ab (top row) and cd (left column) 
are entered using Gray code (that is, two adjacent pairs differ by only one bit). Each box receives the 
value of one minterm, so 2% boxes are needed. The box marked with "0000" (decimal 0) receives the value 
of minterm 1m), the box with "0100" (decimal 4) receives the value of m,, and so on. The minterms are 
explicitly written inside the boxes in Figure 5.15(b). 
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Gray code ——> 


Gray code 
| 1100 | 1000 
1101 | 1001 Adjacent 
1111 | 1011 
(a) 1110 | 1010 (b) 


Adjacent 


FIGURE 5.15. Construction of a four-variable Karnaugh map. 


Prime implicants 


Figure 5.16 shows three actual Karnaugh maps for functions of two (a, b), three (a, b,c), and four (a, b, c, d) 
variables. To simplify a function, all '1's (or '0's) must be grouped in sets as large as possible, whose sizes 
are powers of two (that is, 1, 2, 4, 8,...). Regarding the Gray encoding of the variables, note that the last 
row is adjacent to the first row (they differ by just one bit) and that the last column is adjacent to the first 
column, thus the map can be interpreted as a horizontal or vertical cylinder, as indicated by circular lines 
in Figure 5.15(b). If a group of '1's cannot be made any larger, then it is a prime implicant of that function. 

The map in Figure 5.16(a) has only one '1', which occurs when a='1' and b='1' (minterm m;). There- 
fore, the corresponding equation is y=a-b. 

The map in Figure 5.16(b) has three '1's, which can be collected in two groups, one with two '1's and 
the other with a single minterm. The group with two '1's occurs when a='0", b='1', and c='0' or '1' (hence 
the value of c does not matter), so the corresponding expression is a’: b. The other '1' occurs for a='1', 
b='0', and c='0', so its expression is a-b'-c'. Consequently, the complete minimum (irreducible) SOP 
equation is y=a'-b+a-b'-c'. 

Finally, the map in Figure 5.16(c) has six '1's, which can be collected in two groups, one with four '1's 
and the other with two '1's. The group with four 'l's occurs when a='0' or '1' (hence a does not matter), 
b='1',c='0' or '1' (hence c does not matter either), and d='1', so its expression is b-d. The group with two 
‘1's occurs for a='0' or '1' (hence a does not matter again) and b=c=d='0', so the corresponding expres- 
sion is b'-c'-d’. Consequently, the complete minimum (irreducible) SOP equation is y=b-d+b'-c'-d’. 


Essential implicants 


As seen above, each maximized group of '1's in a Karnaugh map is a prime implicant, so the minimum 
SOP contains only prime implicants. However, not all prime implicants might be needed. As an example, 
consider the map shown in Figure 5.17, which contains four prime implicants (a-c'-d',a-b-c', b-c'-d, 
and a’ -d). Note, however, that only one of the two prime implicants in the center is actually needed. 

An essential implicant is a prime implicant that contains at least one element (minterm) not covered by 
any other prime implicant. Thus in the case of Figure 5.17 only a-c'-d' and a’ -d are essential implicants. 
If a function contains prime implicants that are not essential implicants, then the number of terms in the 
minimum (irreducible) SOP equation contains less terms than the total number of prime implicants. For 
example, the function in Figure 5.17 can be written in two ways (both with only three prime implicants, 
out of four observed in the Karnaugh map): 


y=a-c'-d'+a'-d+a-b-c' ory=a-c'-d'+a'-d+b-c'-d 
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FIGURE 5.17. Function with four prime implicants, of which only two are essential implicants (a-c’-d’ 
and a’-d). 


MM EXAMPLE 5.16 MINIMUM SOP 


Derive a minimum (irreducible) SOP equation for the Boolean function depicted in the Karnaugh 
map of Figure 5.18. 


FIGURE 5.18. Karnaugh map of Example 5.16, which contains “don’t care” values. 


SOLUTION 


Besides '0's and '1's, the map of Figure 5.18 contains also “don’t care” minterms (represented by ‘X’), 
which can be freely included or not in the groups of '1's. The decision is normally made in the sense 
of maximizing the group (prime implicant) sizes. Doing so, the following results: 


y=b'-c-d+a'-b-c'+a-d’. 
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EXAMPLE 5.17 PROOFS FOR THE ABSORPTION THEOREM 
Using Karnaugh maps, prove all three parts of the absorption theorem (Equations 5.3(a)—(c)). 


SOLUTION 


The solution is depicted in Figure 5.19. For each equation, a corresponding two-variable Karnaugh 
map is drawn, showing all equation terms, from which the conclusions are self explanatory. 


(a)ata-b=a (b)at+a-b=at+b (c)a‘b+a-b'=a 


FIGURE 5.19. Proofs for the absorption theorem (Equations 5.1(a)-(c)). | 


Karnaugh maps for zeros 


In the examples above, the minterms for which a '1' must be produced were grouped to obtain optimal SOP 
expressions. The opposite can also be done, that is, the minterms for which the output must be '0' can be col- 
lected with the only restriction that the resulting expression must then be inverted. An optimal POS represen- 
tation then results because the complement of an SOP is a POS of the complemented literals (Section 5.4). 

As an example, let us consider again the Karnaugh map of Figure 5.16(b). Grouping the '0's (and 
inverting the result), the following is obtained: 


y=(a'-b'+a-b+a-c)’ 
Developing this equation, the following results: 
y=(a+b)-(a'+b')-(a' +c’) 


Note that this equation indeed contains only the coordinates for the zeros but in complemented form 
and arranged in POS format. Just to verify the correctness of this equation, it can be expanded, resulting 
in y=a'-b+a-b'-c', which is the expression obtained earlier when grouping the '1's instead of the '0's. 

There is one exception, however, in which these two results (from grouping the ones and from group- 
ing the zeros) are not necessarily equal. It can occur when there are “don’t care” states in the Karnaugh 
map because the grouping of such states is not unique. The reader is invited to write the POS equation 
for the zeros in the Karnaugh map of Figure 5.18 to verify this fact. 


5./ Large Karnaugh Maps 


Although the Karnaugh maps above operate with N <4 variables, the concept can be easily extended to 
larger systems using Shannon’s theorem (Equation 5.5(a)), which states the following: 


f(a, b,c,...)=a'-f(0,b,c,...)+a-f(1, b, c,...) 
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For N=5, two 4-variable maps are then needed, one for the 5" variable (say a) equal to '0', the other 
for a='1'. Subsequently, the set of prime implicants obtained from the first map must be ANDed with a’, 
while the second set must be ANDed with a. This procedure is illustrated in the following example. 


MM EXAMPLE 5.18 KARNAUGH MAP FOR N=5 


Derive an irreducible expression for the 5-variable function depicted in the truth table of Figure 5.20(a). 


minterm abcde y 
Mo 00000 1 
Mio 01010 1 
m4 01011 1 
m4 01110 1 
mis 01111 1 
M20 10100 1 
mMa4 10101 1 
M23 10111 1 

others 0 


(a) (b) 


FIGURE 5.20. (a) Truth table and (b)-(c) Karnaugh maps for the 5-variable function of Example 5.18. 


SOLUTION 


Two 4-variable Karnaugh maps are shown in Figures 5.20(b)—(c) for a='0' and a='1', respectively. As 
can be seen, all prime implicants are essential implicants because all contain at least one element not 
covered by other prime implicants. The corresponding equations are: 
y(a='0')=b’-c’-d'-e'+b-d 
y(a='1')=b'-c-d'+b'-c-e 
Using Shannon’s theorem, we obtain: 
y=a'-y (a='0')+a-y (@='1) 
=@'+(b' +c" «d' +e’ +d) +a+(b'-e+d' +b’ +c-e) 
=a'-b’+c'-d'-e'+a'-b-d+a-b'-c-d'+a-b'-c-e Ht 


5.8 Other Function-Simplification Techniques 


The simplification method described above (Karnaugh maps) is a graphical tool that is only adequate 
for small systems. Larger systems require a computer-based approach. 


5.8.1 The Quine-McCluskey Algorithm 


The first algorithm adequate for computer-based simplifications was developed by Quine and 
McCluskey [McCluskey65] in the 1960s. Like analytical simplification, this algorithm too is based 
on Equation 5.3(c) of the absorption theorem, which states that a-b+a-b'=a. 
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The algorithm starts with a minterm expansion (SOP form), like the following example: 
Y= Mgt Mgt M5 + M+ M9 + IM, + IM, 4+ Ms (5.14) 


This function is also shown in the Karnaugh map of Figure 5.21(b), where the minterms of Equation 5.14 
(see minterm positions in Figure 5.21(a)) were replaced with '1's, while all the others received ‘O's. First, it is 
observed that for a-b+a-b'=a to apply, the corresponding minterms or groups of minterms must be adja- 
cent (that is, must differ by only one bit). Therefore, to save computation time, the minterms of Equation 5.14 
are divided into groups according to the number of '1's that they contain, as shown in the table of 
Figure 5.21(c), where group A contains the vector with zero '1's, group B contains those with one '1', and so 
on. This is important because group A only needs to be compared with group B, group B with group C, etc. 

When the minterms have been properly separated, the first iteration occurs, which consists of compar- 
ing the first vector in group A against all vectors in group B, and so on. The results from this iteration can 
be seen in Figure 5.21(d), where in the comparison of groups A-B only one adjacent vector was found, 
that is, minterms 0 ("0000") and 4 ("0100"), thus resulting from their union "0X00," where ‘X’ stands for 
“don’t care.” At the end of the first iteration, all vectors in Figure 5.21(c) that participated in at least one 
group are checked out (in this example, all were eliminated—marked with a gray shade). 

In the next iteration, the table of Figure 5.21(d) is employed, with the first group (AB) again compared 
against the second (BC), the second (BC) against the third (CD), and so on. The results are shown in the 
table of Figure 5.21(e). Again, all terms from Figure 5.21(d) that participated in at least one group are elimi- 
nated (gray area), as well as any repetition in Figure 5.21(e). Because no other grouping is possible after 
the second iteration, the algorithm ends, and the leftover terms (not shaded out) are the prime implicants 
of the function. In other words, 


y=a'-c'-d'+a'-b-c'+a'-b-d'+b-c-d'+a-c 
Note that these prime implicants coincide with those shown in Figure 5.21(b). However, one of these 


prime implicants (b-c-d') is redundant, so it could have been left out (at the expense of additional com- 
putation effort). 
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FIGURE 5.21. Function simplification using the Quine-McCluskey algorithm. 
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5.8.2 Other Simplification Algorithms 


A major limitation of the Quine-McCluskey algorithm is that its time complexity grows exponentially 
with the number of variables. Moreover, it is not desirable to have gates with too many inputs because 
a gate’s speed decreases as the fan-in (number of inputs) increases. Consequently, most practical circuits 
limit the fan-in to a relatively low value. For example, when using CMOS gates, this limit is normally 
between 4 and 8. 

Because of the fan-in constraint, minimum SOP equations, which might involve very large prod- 
uct terms as well as a large number of such terms, are not necessarily the best implementation 
equations for a given technology. A typical example is the use of CPLD/FPGA devices (Chapter 18), 
in which large equations must be broken down into smaller equations to fit internal construction 
constraints. 

Consequently, modern automated design tools employ heuristic algorithms instead, which lead to 
quasi-minimal solutions tailored for specific technologies. Therefore, the Quine-McCluskey algorithm is 
now basically only of historical interest. 


5.9 Propagation Delay and Glitches 


We conclude this chapter with a brief discussion of propagation delay and glitches in which Karnaugh 
maps can again be employed. 

As already mentioned in Chapter 4 (Section 4.10), digital circuits can be divided into two large groups, 
called combinational circuits and sequential circuits. A circuit is combinational when its output depends 
solely on its current inputs. For example, all gates employed in this chapter (AND, NAND, OR, etc.) are 
combinational. In contrast, a circuit is sequential if its output is affected by previous system states, in which 
case storage elements (flip-flops) are necessary, as well as a clock signal to control the system evolution 
(to set the timing). It was also mentioned that counters are good examples of circuits in this category 
because the next output value depends on the present output value. 

As will be seen in later chapters, sequential circuits can easily generate glitches (undesired voltage/ 
current spikes at the output), so special care must be taken in the designs. However, what we want to 
point out here is that combinational circuits are also subject to glitch generation. 

First, we recall from Chapter 4 (Figure 4.8) that any circuit response to a stimulus takes a certain 
amount of time to propagate through the circuit. Such a delay is called propagation delay and is measured 
under two distinct circumstances, which are depicted in Figure 5.22. One represents the propagation delay 
high-to-low (t+), which is the time interval between the occurrence of a stimulus and the correspond- 
ing high-to-low transition at the circuit output, measured at the midpoint between the logic voltages 
(for example, at 1.65 V if the logic voltages are 0 V and 3.3V). The other represents the propagation delay 
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FIGURE 5.22. Propagation delay in a combinational circuit. 
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low-to-high (t, 4) with a similar definition. These time intervals are illustrated in Figure 5.22 where y is 
an inverted version of x, which can then be produced, for example, by one of the gates shown on the left 
of the figure. 

The aspect that we want to examine in this section is why glitches can occur in combinational circuits, 
as well as how Karnaugh maps can be used to prevent them (so they are not only for function simpli- 
fication). 

Consider the Boolean function depicted in the truth table of Figure 5.23(a) whose corresponding 
Karnaugh map is shown in Figure 5.23(b), from which the irreducible SOP equation y=b-c'+a-c 
results. 

This equation can be implemented in several ways, with one alternative depicted in Figure 5.23(c). 
A corresponding timing diagram is presented in Figure 5.23(d), with a and b fixed at '1' and c transition- 
ing from '1' to '0'. While c='1', the input is abe ="111" (minterm m7), so the circuit is in the box "111" of the 
Karnaugh map. When c changes to '0', the new input is abc="110" (minterm m,), so the circuit moves to 
a new box, "110," of the Karnaugh map. This transition is indicated by an arrow in Figure 5.23(b). Even 
though y='1' in both boxes, it will be shown that, during the transition from one box to the other, a 
momentary glitch (toward zero) can occur in y. 

For simplicity, the propagation delay of all gates (inverter and NANDs) was considered to be the 
same (=T). Note that, when c changes to '0', x, changes to 'l' before x3 has had time to change to '0", so a 
momentary pulse toward zero (a glitch, shown in Figure 5.23(d)) is inevitable at the output. 

The glitch described above can be avoided with the help of a Karnaugh map. Note that there 
are two neighboring essential implicants with a glitch occurring when the system transitions 
from one to the other in the direction indicated by the arrow. There also is, however, another 
prime implicant (a-b, not shown in the figure), which is not an essential implicant because it is 
completely covered by the other two. However, this redundant implicant covers precisely the tran- 
sition where the glitch occurs, so its inclusion would prevent it. The glitch-free function would 
then be y=b-c'+a-cta-b. 

A final note regards the applicability of such a procedure to prevent glitches in combinational circuits, 
which is clearly only feasible in very small designs. A more common approach in actual designs is to 
simply sample the results after a large enough time delay that ensures that all signals have had time to 
settle. 
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FIGURE 5.23. Glitch generation in a combinational circuit. 
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5.10 Exercises 


1. Consensus theorem 
Prove both parts of the consensus theorem (Equations 5.4(a)-(b)). 
2. Shannon’s theorem 
Apply both parts of Shannon’s theorem (Equations 5.5(a)—(b)) to the function y=a':-b+b-c+a-b'-c' 
and check their validity. 
3. Common-term theorem 
Prove Equation 5.8(b) of the common-term theorem. 
4. Common-term extension 


If y=(a+cy)*(a+C)...(a+Cy)*(b+C,)°(b+C,)...(b+cn), where N=M, then prove that y=a-b+ 
D+ Cy + Cy. Cygt Cy Co. Cre 


5. Absorption and consensus theorems 
Check the equalities below using the absorption or consensus theorems. 
a. a-b'+a-b-c=a-b'+a-c 
b. a-b'+b-c+a-c=a-b'+b-c 
6. Binary identities 
Suppose that a, b, and c are three binary variables. Show the following: 
a. (a+b)=(a+c) does not imply necessarily that b=c. 
b. a-b=a-c does not imply necessarily that b=c. 
7. XOR properties 
Prove the XOR properties below. 
a. Associative: a®(b@c)=(a®b)@®c 
b. Distributive: a-(b@®c)=a-b@a-c 
8. XOR functions 
Convert the XOR functions below in SOP equations. 
a. a:b-c@a-b-c 
b. a-b-c@(a-b-c)’ 
c a@Ma:bQ@a:b-c 
9. DeMorgan’s law #1 
Using DeMorgan’s law, simplify the Boolean functions below. 
a. y=[a-(b-c)' (d+(a'-d’))I’ 
b. y=a+b'-[c+d'-(a+b)']' 
ce. y=[((at+b)'+c)-(a+(b+c)'):-(a+b+c)']' 
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10. DeMorgan’s law #2 
Using DeMorgan’s law, simplify the Boolean functions below. 
a. [a'-(b'+c’')]’ 
b. [a+(a+b)'-(a'@®b)'+c']’ 
ce. [a®b-(a+(b@c)’)]’ 
11. Circuit simplification #1 


Simplify the circuits shown in Figure E5.11. 


a 
b 


a’ 
b’ 
c 


c 
d (a) 


FIGURE E5.11. 


12. Circuit simplification #2 
Simplify the circuits shown in Figure E5.12. 


a a 


: boon see 


FIGURE E5.12. 


13. Principle of duality for AND gates 


a. Apply the principle of duality to each gate of Figure E5.13 and draw the resulting circuit (use 
bubbles instead of ticks). 


b. Write the equation for the original and for the dual circuit and check whether they are alike. 


a a a 
(a) Tp (b) P| py ©) bd vy 
b c c 


FIGURE E5.13. 
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14. Principle of duality for OR gates 


a. Apply the principle of duality to each gate of Figure E5.14 and draw the resulting circuit (use 
bubbles instead of ticks). 


b. Write the equation for the original and for the dual circuit and check whether they are alike. 


FIGURE E5.14. 


15. Principle of duality for XOR gates 


a. Apply the principle of duality to each gate of Figure E5.15 and draw the resulting circuit (use 
bubbles instead of ticks). 


b. Write the equation for the original and for the dual circuit and check whether they are alike. 


HY 8 


FIGURE E5.15. 


16. Minterm/Maxterm expansion #1 
Consider the function expressed in the truth table of Figure E5.16. 
a. Complete the minterm and maxterm expressions in the truth table. 
b. Write the corresponding minterm expansion in all three forms shown in Equations 5.8(a)—(c). 


c. Write the corresponding maxterm expansion in all three forms shown in Equations 5.10(a)—(c). 


Minterm Maxterm abc 


Mo= ~ at+b+c 000 


ox 


m,= 001 1 
M2= m= 010 1 
M3= M;= 011 0 
mM4= Mz= 100 0 
ms= Ms = 101 


=/;a 


Tz 


m;=a.b.c Mi = 1141 


oO 


FIGURE E5.16. 


17. Minterm/Maxterm expansion #2 


Suppose that the minterm expansion of a certain 3-variable Boolean function is y= + M3 +1M14+Ms5+M¢. 
Write its corresponding maxterm expansion. 
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18. 


19. 


20. 


21. 


22. 


23. 


24. 


Minterm/Maxterm expansion #3 


Suppose that the minterm expansion of a certain 4-variable Boolean function is y=m,+m3+ 
Mm,+ Ms; +m. Write its corresponding maxterm expansion. 


Prime and essential implicants 
a. Define prime implicant. 
b. Define essential implicant. 


c. Suppose that the irreducible (minimum) SOP equation of a given Boolean function has 5 terms. 
How many prime implicants and how many essential implicants does this function have? 


Standard POS circuit 


Suppose that we want to draw a circuit that implements the trivial function y=a+b+c using the 
POS-based approach, in which a 2-layer NOR-only circuit like that in Figure 5.9(c) is required. Show 
that, even when such an approach is adopted, the circuit eventually gets reduced to a single 3-input 
OR gate. 


Standard SOP circuit 


Suppose that we want to draw a circuit that implements the same trivial function y=a+b+c seen 
above, this time using the SOP-based approach in which a 2-layer NAND-only circuit like that in 
Figure 5.8(c) is required. Show that, even when such an approach is adopted, the circuit eventually 
gets reduced to a single 3-input OR gate. 


Function implementation #1 

Draw a circuit capable of implementing the function y=a+b+c using: 
a. Only OR gates. 

b. Only 3-input NOR gates. 

c. Only 2-input NOR gates. 

d. Only NAND gates. 

Function implementation #2 

Draw a circuit capable of implementing the function y=a+b+c+d using: 
a. Only OR gates. 

b. Only NOR gates. 

c. Only 2-input NOR gates. 

d. Only NAND gates. 

Function implementation #3 

Draw a circuit capable of implementing the function y=a-b-c-d using: 
a. Only AND gates. 

b. Only NAND gates. 
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25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


c. Only 2-input NAND gates. 

d. Only NOR gates. 

Function implementation #4 

Draw a circuit capable of implementing the function y=a+b-ct+d-e-f+g-h-i-j using: 

a. AND gates in the first layer and an OR gate in the second layer. 

b. Only NAND gates in both layers. 

Function implementation #5 

Draw a circuit capable of implementing the function y=a-(b+c):-(dte+f)-(¢+h+i+)) using: 
a. OR gates in the first layer and an AND gate in the second layer. 

b. Only NOR gates in both layers. 

Function implementation #6 

Draw a circuit capable of implementing the function y=a-(b+c)-(d+e+f) using: 

a. Only NAND gates. 

b. Only 2-input NAND gates. 

Function implementation #7 

Draw a circuit capable of implementing the function y=a-(b+c)-(d+e+f) using: 

a. Only NOR gates. 

b. Only 2-input NOR gates. 

Consensus theorem 

Using a 3-variable Karnaugh map, check the consensus theorem. 

Analytical function simplification 

Using the analytical function simplification technique, simplify the Boolean functions below. 
a. y=a-bt+a'-b-c'+a-b'-c 

b. y=a'-bt+a-b-c'+a'-b'-c 

ce. y=a-b'-c'+a-b'-c-d+b'-c-d'+b'-c'-d' 

Function simplification with Karnaugh maps #1 

Using Karnaugh maps, simplify the functions in the exercise above and then compare the results. 
Function simplification with Karnaugh maps #2 


For the function y=f(a, b, c) depicted in the Karnaugh map of Figure E5.32, complete the 
following: 


a. What are the prime implicants? Which of them are also essential implicants? 


b. Obtain an irreducible (minimum) SOP equation for y. 
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c. Draw acircuit that implements y using AND gates in the first layer and an OR gate in the second 
layer. Assume that the complemented versions of a, b, and c are also available. 


d. Repeat part (c) above using only NAND gates. 


FIGURE E5.32. 


33. Function simplification with Karnaugh maps #3 


For the function y=f(a, b, c, d) depicted in the Karnaugh map of Figure E5.33, complete the 
following: 


a. What are the prime implicants? Which of them are also essential implicants? 
b. Obtain an irreducible (minimum) SOP equation for y. 


c. Draw acircuit that implements y using AND gates in the first layer and an OR gate in the second 
layer. Assume that the complemented versions of a, b, c, and d are also available. 


d. Repeat part (c) above using only NAND gates. 


FIGURE E5.33. 


34. Function simplification with Karnaugh maps #4 


For the function y=f(a, b, c, d) depicted in the Karnaugh map of Figure E5.34, complete the 
following: 


a. What are the prime implicants? Which of them are also essential implicants? 
b. Obtain an irreducible (minimum) SOP equation for y. 


c. Draw acircuit that implements y using AND gates in the first layer and an OR gate in the second 
layer. Assume that the complemented versions of a, b, c, and d are also available. 


d. Repeat part (c) above using only NAND gates. 
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FIGURE E5.34. 


35. Function simplification with Karnaugh maps #5 


For the function y=f(a, b, c, d) depicted in the Karnaugh map of Figure E5.35, complete the following: 


a. What are the prime implicants? Which of them are also essential implicants? 
b. Obtain an irreducible (minimum) SOP equation for y. 
c. Draw acircuit that implements y using AND gates in the first layer and an OR gate in the second 
layer. Assume that the complemented versions of a, b, c, and d are also available. 
d. Repeat part (c) above using only NAND gates. 
FIGURE E5.35. 


36. Function simplification with Karnaugh maps #6 


Using a Karnaugh map, derive a minimum (irreducible) SOP expression for each function y 
described in the truth tables of Figure E5.36. 


(a){_abc | y | 
| 000 | 1 | 


Others | 0 


Others 0 


[| Others | 0 | 


FIGURE E5.36. 
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37. 


38. 


39. 


Large Karnaugh map #1 


Assume that y=f(a, b,c, d, e) is a 5-variables Boolean function given by y=, +ms5+Mg+ My +143+ 
My5 + Myg + Myo. 


a. Draw the corresponding truth table (it can be in degenerate form as in Figure 5.20(a)). 

b. Draw two Karnaugh maps, one for a='0' and the other for a='1' (as in Figures 5.20(b)—(c)). 
c. Find an irreducible SOP equation for y. 

Large Karnaugh map #2 


Assume that y=f(q, b, c, d, e) is a 5-variables Boolean function given by y=img+ M4+ Mg +114) +M47+ 


a. Draw the corresponding truth table (it can be in degenerate form as in Figure 5.20(a)). 

b. Draw two Karnaugh maps, one for a='0' and the other for a='1' (as in Figures 5.20(b)—(c)). 
c. Find an irreducible SOP equation for y. 

Combinational circuit with glitches #1 


The circuit in Figure E5.39 exhibits a glitch at the output during one of the input signal transitions 
(similar to what happens with the circuit in Figure 5.23). 


a. Draw its Karnaugh map and determine the essential implicants. 


b. When the circuit jumps from one of the essential implicants to the other in a certain direction, 
a glitch occurs in y. Which is this transition? In other words, which one of the three input signals 
must change and in which direction ('0' to '1' or '1' to '0')? 


Draw the corresponding timing diagram and show the glitch. 


d. How can this glitch be prevented? 


FIGURE E5.39. 


40. 


Combinational circuit with glitches #2 


Devise another combinational circuit (like that above) that is also subject to glitches at the output 
during certain input transitions. 


Line Codes 


Objective: The need for communication between system units, forming larger, integrated networks, 
and the need for large data-storage spaces are two fundamental components of modern digital designs. 
The digital codes employed in such cases are collectively known as line codes, while the corresponding 
data-protection codes are collectively known as error-detecting/correcting codes. With the integration of 
subsystems to perform these tasks onto the same chip (or circuit) that perform other, more conventional 
digital tasks, a basic knowledge of these techniques becomes indispensable to the digital designer. For 
that reason, an introduction to each one of these topics is included in this text with the former presented 
in this chapter and the latter in the next. 

To a certain extent, line codes can be viewed as a continuation of Chapter 2, in which several binary codes 
were already described (for representing numbers and characters). Line codes are indispensable in data 
transmission and data storage applications because they modify the data stream, making it more appropri- 
ate for the given communication channel (like Ethernet cables) or storage media (like magnetic or optical 
memory). The following families of line codes are described in this chapter: Unipolar, Polar, Bipolar, Biphase/ 
Manchester, MLT, mB/nB, and PAM. In the examples, emphasis is given to Internet-based applications. 


Chapter Contents 


6.1 The Use of Line Codes 

6.2 Parameters and Types of Line Codes 
6.3. Unipolar Codes 

6.4 Polar Codes 

6.5 Bipolar Codes 

6.6 Biphase/Manchester Codes 

6.7. MLT Codes 

6.8 mB/nB Codes 

6.9 PAM Codes 

6.10 Exercises 


6.1. The Use of Line Codes 


In Chapter 2 we described several codes for representing decimal numbers (sequential binary, Gray, 
BCD, floating-point, etc.) and also codes for representing characters (ASCII and Unicode). In this 
chapter and in the next we introduce two other groups of binary codes, collectively called line 
codes (Chapter 6) and error-detecting/correcting codes (Chapter 7). Such codes are often used together, 
mainly in data transmission and data storage applications. However, while the former makes 
the data more appropriate for a given communications/storage media, the latter adds protection 
against errors to it. 
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Concept 


The basic usage of line codes is illustrated in Figure 6.1, which shows a serial data stream x being transmitted 
from location A to location B through a communications channel. To make the data sequence more “appro- 
priate” for the given channel, x is modified by a line encoder, producing y, which is the data stream actually 
sent to B. At the receiving end, a line decoder returns y to its original form, x. 

Note in Figure 6.1 that a line encoder modifies the data sequence, thus the word “code” has a more 
strict meaning than in Chapter 2 (in Chapter 2, it was used sometimes to actually indicate a different data 
structure, like Gray or BCD codes, while in other occasions it simply meant a different representation for 
the same data structure, like octal and hexadecimal codes). 

Even though illustrated above for data transmission, line codes are also employed in other applications. 
An important example is in data storage, particularly in magnetic and optical (CDs, DVDs) media, where 
specific encoding schemes are necessary. The particular case of audio CDs, which combine line codes with 
error-correcting codes, will be described in detail in Section 7.5. 


Ethernet applications 


An actual application for line codes is shown in Figure 6.2, where Ethernet interfaces (which include 
the encoder-decoder pairs) can be seen. The channel in this case is the well-known blue cable used for 
Internet access, which contains four pairs of twisted copper wires (known as unshielded twisted pair, or 
UTP) that are proper for short distance communication (typically up to 100m). Three of the four Ethernet 
interfaces that operate with UTPs are included in Figure 6.2. 

In Figure 6.2(a), the first Ethernet interface for UTPs, called 10Base-T, is shown. It uses only two of the 
four twisted pairs, normally operating in simplex (unidirectional) mode, with a data rate of 10 Mbps in 
each pair. Manchester is the line code employed in this case (described ahead). The UTP is category 3, 
which is recommended for signals up to 16 MHz. 

In Figure 6.2(b), the very popular 100Base-TX Ethernet interface is depicted (which is probably what is 
connected to your computer right now). Again, only two of the four twisted pairs are used, usually also 
operating in simplex mode. However, even though the (blue) cable has the same appearance, its category 
is now 5, which allows signals of up to 100 MHz. A different type of line code is employed in this case; it 
is a combination of two line codes called 4B/5B and MLT-3 (both described ahead). The symbol rate in 
each UTP is 125 MBaud (where 1Baud = 1symbol/second), which in this case gives an actual information 
rate of 100 Mbps in each direction. It will be shown in the description of the MLT-3 code that a 125 Mbps 
signal can be transmitted with a spectrum under 40 MHz, thus well under the cable limit of 100 MHz. 

Finally, Figure 6.2(c) shows the 1000Base-T Ethernet interface. In it, all four pairs are used and operate 
in full-duplex (simultaneous bidirectional) mode with a symbol rate of 125 MBaud per pair, totaling 500 
MBaud in each direction. Because in this encoding each symbol contains two bits, the actual information 
rate is 1 Gbps in each direction. The line code employed in this case is called 4D-PAM5, combined with 
trellis encoding for FEC (forward error correction) and data scrambling to whiten the spectrum and 
reduce the average DC voltage (scramblers are studied in Section 14.8). The UTP is still the 4-pair (blue) 
cable, now of category 5e, which still limits the spectrum to 100 MHz. It will be shown in the description 
of the corresponding line codes how such a high information rate can be fit into this cable. 


(A) | (8) 
; Channel ; 
xX—r Line p—® Y --------------------- y— Line x 
encoder decoder 
RM eke yea aye iB pease 


FIGURE 6.1. Line encoder-decoder pair. 
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a 4-pair UTP cat. 3 
se (2 simplex typical, 2 not used) 


10Base-T 
NIC 


10Base-T 
(a) NIC 


wt Manchester code; Info=10Mbps 


4-pair UTP cat. 5 
Zz (2 simplex typical, 2 not used) 


ee 


(b) 100Base-TX o—<——o—n—nsooesse>ssomv\ eee 100Base-TX 
NIC = NIC 


4B/5B + MLT-3; 125MBaud; Total info=100Mbps 


4-pair UTP cat. 5e (all full-duplex) 


1000Base-T = 1000Base-T 


(c) NIC NIC 


| Scrambling + FEC + 4D-PAMS5; 125MBaud per pair: Total info=1Gbps L__. 


FIGURE 6.2. Ethernet interfaces based on twisted pairs: (a) 10Base-T, (b) 100Base-TX, and (c) 1000Base-T. 
The respective transmission modes and line codes are indicated in the figures. 


6.2 Parameters and Types of Line Codes 


Several parameters are taken into consideration when evaluating a line code. A brief description of the 
main ones follows. A list with the main line codes is provided subsequently. 


Line code parameters 


DC component: A DC component corresponds to the waveform’s average voltage. For example, if the 
bits are represented by '0'=0V and 'l'=+V, and they are equally likely, then the average DC voltage is 
V/2 (see Figure 6.3). This is undesirable because it is a large energy that is transmitted without convey- 
ing any information. Moreover, DC signals cannot propagate in certain types of lines, like transformer- 
coupled telephone lines. 

Code spectrum: Given that any communications channel has a finite frequency response (that is, it 
limits high-frequency propagation), it is important that the spectrum of the signal to be transmitted be as 
confined to the allowed frequency range as possible to prevent distortion. This is illustrated in Figure 6.3 
where the transmitted signal is a square wave but the received one looks more like a sinusoid because the 
high-frequency components are more attenuated than the low-frequency ones. For example, as mentioned 
earlier, the maximum frequency in categories 5 and 5e UTPs is around 100 MHz. 

Additionally, for a given total power, the spreader the spectrum is the less it irradiates because spreading 
avoids high-energy harmonics. Indeed, because serious irradiation constraints are imposed for frequencies 
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t Unshielded twisted pair (UTP) ; 
Transmitter Receiver 


(encoder) (decoder) 


FIGURE 6.3. Data transmission illustrating the signal’s DC component and the channel's limited spectrum. 


Line Codes 


Unipolar codes | NRZ Acronyms: 
RZ NRZ = Nonreturn to 
NRZ-I Zero 

Polar codes NR. RZ = Return to Zero 
RZ NRZ-| = NRZ-Invert 
NRZ-I AMI = Alternate Mark 

R 

RZ 


Z 
Bipolar codes NRZ Inversion 
(AMI MLT = Multilevel 


Biphase codes | Manchester Transition 
Differential Manchester mB/nB = m Bits/n Bits 
MLT codes MLT-3 PAM = Pulse Amplitude 


mB/nB codes | 4B/5B Si 


8B/10B 


PAM codes 4D-PAM-5 


FIGURE 6.4. Line codes described in this chapter. 


above 30 MHz, only spread spectrums are allowed in UTPs above this value (achieved, for example, with 
data scrambling). 

Transition density: If the data transmission is synchronous, the decoder must have information about 
the clock used in the encoder to correctly recover the bit stream. This information can be passed in a 
separate channel (a wire, for example) containing the clock, or it can be recovered from the bit stream 
itself. The first option is obviously too costly, so the second is employed. The recovery (with a PLL (phase 
locked loop) circuit, Section 14.6) is possible if the received signal contains a substantial number of tran- 
sitions ('0'>'1', '1'>'0'). 

As will be shown, some codes provide a transition density of 100%, meaning that for every bit there 
is a transition. Others, on the other hand, can remain a long time without any activity, making clock 
recovery (synchronism) very difficult. Codes that do not provide transitions in all time slots are often 
measured in terms of maximum run length, which is the longest run of consecutive '0's or 'l's that it can 
produce (of course, the smaller this number, the better from the synchronization point of view). 

Implementation complexity: This is another crucial aspect, which includes the hardware complex- 
ity as well as the time complexity. From the hardware perspective, the encoder/decoder pair should be 
simple, compact, and low power to be easily incorporated into fully integrated systems. From the time 
perspective, the encoding /decoding procedures should be simple (fast) enough to allow communication 
(encoding-decoding) at the desired speed. 


Types of line codes 


There are a large variety of line codes, each with its own advantages and disadvantages and intended 
applications. Fourteen types (listed in Figure 6.4) are selected for presentation in the sections that follow. 
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The first 11 codes of Figure 6.4 are illustrated in Figure 6.5, while the other three are illustrated 
separately. The column on the right of Figure 6.5 summarizes the average (~) or exact (=) DC voltage 
of each code, as well as its main (1*' harmonic) spectral frequency, accompanied by the bit sequence 
that causes the latter to peak. 


6.3 Unipolar Codes 


The first three groups in Figure 6.5, called unipolar, polar, and bipolar, have the following meaning: uni- 
polar employs only the voltages 0V and +V; polar employs —-V and +V (OV is used only for RZ); and 
bipolar uses -V, OV, and +V. 

Figure 6.5 also shows that these three groups contain options called NRZ (nonreturn to zero), RZ 
(return to zero), and NRZ-I (NRZ invert). Such names are misleading because NRZ means that the 
voltage does not return to zero while data='1' (it does return to zero when '0' occurs), and RZ means that 


clock | | pc 


AC spectrum 

data 0 1 0 0 1 1 0 1 comp. peak at (for) 

NRZ es so coe eee 

Unipolar { RZ [I CT Ft Tt | ie tect 99... 
NRZ TT | wre ten 8111...) 


NRZ | | | | | <0 fex/2 (0101...) 
Polar { RZ | Pl, A_POLL TD wa ~0  fek (0000...) 
cs 14 or (1111...) 
NRZ-I | | | | | | | 0 fad2 (1111...) 
Pp Ls] LL =O fex/2 (1111...) 
RZ (AMI) — = ¢ er =0  fex/2(1111...) 
Manchester | | ' : | | | | : | : | ‘ =0 —fck (0000...) 
or (1111...) 
Differential | | | | | | | | » | =0 — fek (0000...) 
Manchester 


Multilevel | i 2 
transition) MLT-3 yO  __ =O fex/4 (1111...) 


FIGURE 6.5. Illustration including 11 of the 14 line codes listed in Figure 6.4. The encoded sequence is 
"01001101". The average (~) or exact (=) voltage of the DC component and the frequency of the main AC 
component (1% harmonic) for each code are listed on the right along with the bit sequence that causes the 
latter to peak. 


NRZ 


Bipolar < 


Biphase < 
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FIGURE 6.6. Illustration showing that in an NRZ-I code the fundamental harmonic is f,,/2, which peaks 
(highest energy) when the data sequence "111...." occurs. 


the voltage does return to zero while data='1'. NRZ-I means that the voltage does not return to 0 V while 
data ='1', but it alternates between +V and —V. Particular details are described below. 


Unipolar NRZ 


As shown in Figure 6.5, unipolar NRZ is a regular binary code, so it employs '0'=0V and '1'=+V. If the 
bits are equally likely, then the average DC voltage is V/2 ("~" in Figure 6.5 indicates average). Note that 
the main AC component is f,/2 (where fy, is the clock frequency), and peaks when the sequence "0101..." 
occurs (one period of the sinusoid corresponds to two data bits, that is, two clock periods). Besides the 
high DC component, a long series of '0's or of 'l's makes synchronization very difficult, so this code is not 
practical for regular data transmission applications. 


Unipolar RZ 


Here '0'=0V, as before, but '1'=+V/OV, that is, the voltage returns to 0 V while the bit is '1'. This decreases 
the average DC level to V/4, but it increases the spectrum (see the column on the right of Figure 6.5). 
Consequently, this code too is limited to very simple applications. 


Unipolar NRZ-I 


The NRZ invert code consists of alternating the output voltage between 0 V and +V when a'l' is found. 
It prevents long periods without signal transitions when "1111..." occurs, though it cannot prevent it 
when "000..." happens. Its main attribute is that it is a differential code, so decoding is more reliable 
because detecting a voltage transition is easier than detecting a voltage level. 

Its DC component and AC spectrum are similar to unipolar NRZ, but the largest first harmonic volt- 
age occurs when the sequence of bits is "1111..." rather then "0101...." This is illustrated in Figure 6.6 
where a sinusoid (first harmonic) with frequency f,./2 can be seen for data="111...." 

This type of encoding is employed in commercial music CDs where the transition from land 
to pit or from pit to land is a '1', while no transition is a '0' (in this type of CD, data are recorded by 
very small indentations, called pits, disposed in a spiral track; the flat region between two pits is 
called land). 


6.4 Polar Codes 


Polar codes are similar to unipolar codes except for the fact that they employ —V and +V (plus 0V for 
RZ) instead of OV and +V. 
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Polar NRZ 


As shown in Figure 6.5, polar NRZ is similar to unipolar NRZ, but now with '0'=-V and '1'=+V, so the 
DC component oscillates about 0 V instead of V/2. The main spectral component is still the same. 


Polar RZ 


Compared to the unipolar RZ, polar RZ code uses '0'=—V and '1'=+V, so the bits return to zero (in the 
middle of the clock period) while data='0' and while data='1'. Like unipolar RZ, this doubles the main 
spectral AC component, and its peak occurs in two cases rather than one (for "0000..." and "1111..."). 
On the other hand, the average DC voltage is now zero. 


Polar NRZ-I 


Like unipolar NRZ invert, the encoder switches its output when a '1' occurs in the bit stream. This is 
another differential code, so decoding is more reliable (detecting a voltage transition is easier than 
detecting a voltage level), and it is used in some transition-based storage media, like magnetic memo- 
ries. Its main AC component is similar to unipolar NRZ-I, but the DC component is better (~0). 


6.5 Bipolar Codes 


As shown in Figure 6.5, bipolar codes employ '0'=0V and '1'=+V. 


Bipolar NRZ 

In bipolar NRZ, '0'=0V, while '1' alternates between +V and —V. It achieves true-zero DC voltage 
(assuming that the number of ones is even) without enlarging the spectrum (peak still at f4./2, which 
occurs for "1111..."). 


Bipolar RZ 

Bipolar RZ, also called AMI (alternate mark inversion), is similar to bipolar NRZ except for the fact that 
the '1's return to 0V in the middle of the clock period. The DC components and peak frequencies are 
similar (though their overall harmonic contents are not), again occurring when data="1111...." 


6.6 Biphase/Manchester Codes 


Biphase codes, also called Manchester codes, are also depicted in Figure 6.5. At the middle of the clock 
period, a transition always occurs, which is either from —V to +V or from +V to —V. By providing a 100% 
transition density, clock recovery (for synchronism) is easier and guaranteed (at the expense of spectrum). 


Regular Manchester code 

In this biphase code, '0' is represented by a +V to —V transition, while a '1' is represented by a transi- 
tion from —V to +V. The transition density is therefore 100%, so clock recovery is simpler and guaran- 
teed, but the spectrum is enlarged; note that the frequency of its main component is twice the regular 
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FIGURE 6.7. MLT-3 encoding for the data sequence "111...." 


value (that is, f,,, instead of f,./2), and peaks for two data patterns ("0000..." and "1111..."). As shown in 
Figure 6.2, this code is employed in the 10Base-T Ethernet interface. 


Differential Manchester code 


In this differential biphase code, the direction of the transitions changes when a '1' occurs in the bit 
stream. Therefore, if the received signal is sampled at 31/4 of one time slot and at T/4 of the next, equal 
voltages indicate a 'l', while different voltages signify a '0' (differential decoding is more reliable). Again 
the DC voltage is truly zero, but the first harmonic is again high (f,,), occurring when data="1111...." 


6.7 MILT Codes 


MLT (multilevel transition) codes operate with more than two voltage levels, which are accessed sequen- 
tially when '1' occurs in the data sequence. Its purpose is to reduce the signal’s spectrum. 


MLT-3 code 


The most common MLT code is MLT-3, depicted in the last plot of Figure 6.5. It is simply a 3-level 
sequence of the type..., -1, 0, +1, 0, -1,..., controlled by the '1's, which causes the first harmonic to 
be reduced to f,./4. This fact is illustrated in Figure 6.7, which shows MLT-3 encoding for the data 
sequence "111...." Note that one sinusoid period corresponds to four clock periods with the additional 
property of providing zero DC voltage. As shown in Figure 6.2, this code, combined with the 4B/5B 
code described below, is used in the 100Base-TX Ethernet interface. 


6.8 mB/nB Codes 


In all codes described above, the correspondence between the number of bits at the input of the encoder 
and at its output is 1:1 (so the data rate equals the information rate). In mB/nB codes the relationship is 
m:n, that is, m information bits enter the encoder, which produces n (>m) data bits (hence a higher data 
rate is needed to attain the desired information rate). This is done to improve some of the code param- 
eters described earlier, notably transition density and DC component. This effect can be observed in the 
4B/5B and 8B/10B codes described below, which are the main members of the mB/nB code family. 


4B/5B code 

Because with 4 bits there are 16 codewords, while with 5 bits there are 32, codewords with at least two 
‘1's and one '0' can be chosen, thus guaranteeing enough transitions for clock recovery (synchronization) 
and a reasonable DC balance. The corresponding translation table is presented in Figure 6.8. 
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4B/5B code 
| Output [| Input —si Control characters 
[0000 | 11110 | 8 | 1000 | 10010 | Q(@uey | 
[0001 | o1007_| 9 | 1001 | 10011 | (ide) —*s 


11401 | (R Reset 


FIGURE 6.8. Encoding-decoding table for the 4B/5B code. 


Input (serial) 


Deserializer 


ED+C BA Control 


Serializer 


Output (serial) 


FIGURE 6.9. Simplified 8B/10B encoder architecture. 


As mentioned in Section 6.1, this code is used in 100Base-TX Ethernet, which then requires the 
transmission of 125 Mbps (also called 125 MBaud, where 1Baud = lsymbol/second) to achieve an actual 
information rate of 100 Mbps. Because this frequency is too high for the category 5 UTP cable and much 
above the 30 MHz irradiation constraint, the output bit stream is passed through an MLT-3 encoder 
(Section 6.7). Because the frequency of the main AC component in the latter is fy./4, the fundamental 
harmonic in the cable gets reduced to 125/4=31.25 MHz, thus complying with the cable frequency range 
and also maintaining the only high-energy harmonic close to the recommended limit of 30 MHz. 


8B/10B code 


Introduced by IBM in 1983 [Widner83], this is the most popular member of the mB/nB code family. Each 
8-bit block (one byte) is encoded with 10 bits. The choice of 8-bit blocks is a natural one because bits are 
generally handled in multiples of bytes. 

The general encoder architecture is depicted in Figure 6.9. The incoming bit stream, which is serial, 
is stored in the deserializer (clock not shown) to form the 8-bit block called "HGFEDCBA" (the same 
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notation used in the original paper is employed here, where A and a represent the LSBs of the input and 
output vectors, respectively). This block is broken into two subblocks, one with the 3 MSBs, the other 
with the 5 LSBs, which are passed through two specially designed encoders of sizes 3B/4B and 5B/6B, 
respectively. The resulting 10-bit output (denoted by "jhegfiedcba") is then reserialized by the last circuit 
block. Note in the architecture the presence of a control unit that is responsible for handling special char- 
acters (called K) and for calculating RD (running disparity). 

In simple terms, this code operates as follows. Because there are only 2°=256 input patterns, while 
the output allows 2'°=1024 patterns, the codewords can be chosen in a way to improve some of the 
code parameters, notably transition density and DC balance. Moreover, additional (special purpose) 
codewords can also be created for data transmission control (like package delimiters). A total of 268 
codewords were picked for this code (plus their complements), of which 256 are for the 8-bit inputs and 
12 are for the special characters (K codewords). Such codewords are listed in Figure 6.10 (for the regular 
codewords only the initial eight and final eight are shown) where the internal vector separation only 
highlights the results from the 3B/4B and 5B/6B encoders of Figure 6.9. 

As canbe seen in Figure 6.10, the chosen codewords have four '0's and six '1's (disparity = +2), or five '0's 
and five '1's (disparity =0), or six '0's and four '1's (disparity =—2). This guarantees a large number of 
transitions, hence easy synchronization. Each codeword is available in two forms, one with disparity =0 
or +2 and its bitwise reverse, therefore with disparity =0 or —2. The encoder keeps track of the accumu- 
lated disparity (called running disparity—see RD in Figure 6.9); if RD=+2, for example, then the next 
time a codeword with disparity # 0 must be transmitted the encoder picks that with disparity =—2. This 
guarantees a DC voltage of practically OV. 

The 12 special characters, denoted by K and used for data transmission control, are also included in 
Figure 6.10. Note that three of them consist of "xxx1111100" (K28.1, K28.5, and K28.7), that is, the last seven 
LSBs are "1111100". This sequence is called +comma, while its complement, "0000011," is called —comma. 
The importance of these sequences is that they do not occur anywhere else, neither in separate codewords 


Regular codewords 
Input Output if RD- 

00000000 0010 111001 
00000001 0010 101110 
00000010 0010 101101 
00000011 1101 100011 
00000100 0010 101011 
00000101 1101 100101 
00000110 1101 100110 
00000111 1101 000111 


Special codewords 
Output if RD+ 


code Output if RD- Output if RD+ 
jhgf_iedcba jhgf iedcba jhgf iedcba 


1101 000110 K28.0 0010 111100 | 1101 000011 
1101 010001 K28.1 1001 111100 | 0110 000011 
1101 010010 K28.2_| 1010 111100 | 0101 000011 
0010 011100 K28.3 1100 111100 0011 000011 
1101 010100 K28.4 0100 111100 1011 000011 
0010 011010 K28.5 0101 111100 1010 000011 
0010 011001 K28.6 0110 111100 1001 000011 
0010 111000 K28.7 0001 111100 1110 000011 
a K23.7 0001 010111 1110 101000 
0111 001100 K27.7 0001 011011 1110 100100 
1000 100110 K29.7 0001 011101 1110 100010 
1000 100101 K30.7 0001 011110 1110 100001 
0111 100100 
1000 100011 
0111 100010 
100001 
001010 


NO [aA |S lwo ly |= |o 


11111000 1000 110011 
11111001 0111 011001 
11111010 0111 011010 
11111011 1000 011011 
11111100 0111 011100 
11111101 1000 011101 
11111110 1000 011110 


411111111 | 1000 110101 


FIGURE 6.10. Partial encoding-decoding table for the 8B/10B code. 
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nor in overlapping codewords, so they can be used as packet delimiters or to indicate an idle condition 
(K28.5 is normally used as comma). 

The 8B/10B code is used, for example, in the 1000Base-X gigabit Ethernet interface (for optical fiber 
communication), in which the transmission of 1.25Gbps is then required to achieve an actual rate of 
1Gbps of information bits. Another application example is in the new PCI Express standard for PCs 
(see Section 10.9, Figure 10.37). 


6.9 PAM Codes 


Even though PAM (pulse amplitude modulation) codes employ multiple voltage levels, they operate 
very differently from MLT codes. This can be seen in the 4D-PAM5 code described below. 


4D-PAMS5 code 


As shown in Figure 6.2, the 4D-PAM5 code (4-dimensional PAM code with 5 voltage levels) is employed in 
the 1000Base-T Ethernet interface. This interface is shown with additional details in Figure 6.11(a). The trans- 
mitter contains a pseudo-random data scrambler (to spread the spectrum and reduce the DC component— 
scramblers will be discussed in Section 14.8) plus a trellis encoder (a convolutional encoder that adds redun- 
dancy for error correction, which will be discussed in Chapter 7) and finally the 4D-PAM5 encoder (which 
converts the 9 bits into a 4D 5-level symbol). Likewise, the receiver contains a 4D-PAM5 decoder (which 
deconverts the 4D 5-level symbol to regular bits), a Viterbi decoder (to decode the convolutional code and pos- 
sibly correct some errors), plus a pseudo-random descrambler (to return the bit stream to its original form). 


Scrambler 
8b 


ne 
125MBaud 

— 
4D-PAM5 |——?T—7 H POSCoeococoooco 125MBaud ~ 


> 
Trellis = 125MBaud <q— 
Co el i i ee ee ee ae i 125MBaud ~¢— 


Viterbi 
decoder 


4D-PAM5 
decoder 


Descrambler 


(a) 


—e» 125MBaud 
ooo OOOO 
—+—_ 125MBaud 

—» 125MBaud 
2OoOoO OOO 


(b) 1000Base-T —— = 125MBaud 


Ethernet —» 125MBaud 
ooo OOO 
—— 125MBaud 

—e 125MBaud 
oOo OS OOO 


a <— 125MBaud 
“SX. 4 symbol (=9 bits) 


FIGURE 6.11. 1000Base-T Ethernet interface illustrating the use of 4D-PAM5 code. 
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The channel, which in this case consists of four twisted pairs of wires (category 5e UTP, described 
earlier), operates in full-duplex mode (that is, both ends of the channel can transmit simultaneously), 
which is possible due to the hybrid (H, normally constructed with a magnetic circuit plus associated 
crosstalk and echo cancellers) that couples the transmitter’s output to the UTP and the incoming signal 
(from the UTP) to the receiver. The internal circuit delivers 1Gbps of information bits to the 4D-PAM5 
encoder. Because every eight information bits creates one 4D channel symbol, such symbols must be 
actually transmitted at a rate of 125 Msymbols/second (125 MBaud). (Note in the figure that two of the 
eight information bits are first used to create a ninth bit, called parity bit, so the nine bits together select 
the symbol to be transmitted.) 

The PAM5 code operates with five voltage levels, represented by {—2,—1, 0, +1, +2} (the actual values 
normally are -1 V, -0.5V, OV, 0.5 V, and 1V) and illustrated in the timing diagram of Figure 6.11(b), which 
shows the signals traveling along the wires. However, contrary to the MLT code seen earlier, in PAM 
codes all channels (transmitters) are examined together rather than individually (so the signals in the 
diverse channels are interdependent). For example, the case in Figure 6.11(b) involves four channels, and 
that is the reason why it is called 4D (four-dimensional) PAM5 code. 

Before we proceed with the 4D case, let us examine a simpler implementation constructed using a 
2D-PAM5 code (thus with 2 channels). This code is also known as PAM5 x5 because the 2D symbol 
space forms a 5 x5 constellation. This case is illustrated in Figure 6.12(a), which shows all 5 x 5=25 sym- 
bols that can occur when we look at both channels simultaneously. The minimum Euclidean distance 
between the points is obviously just 1 unit in this case. 

The effect of channel attenuation plus the pick up of noise as the signal travels along the cable is 
illustrated in Figure 6.12(b). The former causes the points to get closer, while the latter causes them 
to blur. In other words, both cause the effective separation between the points to decrease, making 
the correct identification of the transmitted symbol more difficult. 


} channel 2 
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(a) @—e @——@®— > channel 1 
@—@ ? @—@ 
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© @ © 6 et 
oR ae ate ate ae ae 
() @ © © © © > HM He HH —> Bee He HH 
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Hote ae MMs Bi Fd 


FIGURE 6.12. (a) Symbol constellation for a 2D-PAM5 code; (b) Effect of channel attenuation and noise. 
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To combat the effects above (channel attenuation and noise), one solution is not to use the whole 
symbol constellation but only a subset with higher Euclidean distance. The splitting of the 2D-PAM5 
constellation into subconstellations is depicted in Figure 6.13. First, the five PAM5 logical values are 
divided into two subsets, called X=(-1, 1) and Y=(-2, 0, +2). Note in the complete constellation of 
Figure 6.13(a) that different symbols are used to represent points with coordinates of types XX, YY, 
XY, and YX. Level-1 splitting is shown in Figures 6.12(b)-(c). The upper subset is said to be even and 
the lower is said to be odd because they have an even or odd number of coordinates coming from X, 
respectively. In these two subsets, the minimum Euclidean distance is (or minimum squared distance) 
equal to 2. Level-2 splitting is shown in Figures 6.13(d)-(g), now leading to subsets whose minimum 
Euclidean distance is 2. 

Let us now consider an application where the 2D-PAM5 code described above must be used. Suppose 
that 12 bit patterns must be encoded (that is, for each pattern a symbol must be assigned). There are at 
least two solutions for this problem. The simple one is to use only the even or the odd subset of points 
(Figures 6.13(b)—(c)) because either one has enough symbols (13 and 12, respectively), with a resulting 
minimum squared distance equal to 2. What one might argue is that when one of these subsets has 
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° ° fe) 
° ° ro) 
5 
° ° {o) 
° ° fe) (d) Even YY 
Oo Oo 
ro) oO 
° ro} ro) 
(b) Even + 
@—{}—_@—_1}—@ 
ro) ° 
o—o—2—o— 
@—_tl_@ 00 o< (e) Even XX 
0—o0—4—_0—0 
o——o—+—a 
o—0—-e—_0—-0 
(a) Complete o o—~* 
Oo Oo oO 
a——__& a 
oO oO a (f) Odd YX 
Oo Oo 
@ =YY ee 
O =XX 
O =Yx oO oO 
O =xXyY (c) Odd Oo oO 
Oo a) 
(g) Odd XY 


FIGURE 6.13. 2D constellation splitting to obtain subsets with larger Euclidean distances. 
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been chosen, the other is simply thrown away. So maybe there is some other solution where both spaces 
(even plus odd) can be used simultaneously to provide a larger Euclidean distance. This might seem 
to be a paradox at first because if the two subsets are used we are back to the full set (Figure 6.13(a)), 
where the minimum distance is just 1. The solution resides in the fact that there are many more points 
(25) than the number needed (12), so maybe we can stay in the complete symbol space but restrict 
which points can come after each point such that the minimum squared distance between any two 
sequences is greater than 2. This transformation of individual (independent) symbols into a sequence of 
dependent symbols is called convolutional encoding, and it leads to an optimal usage of the symbol space. 
Not only does it allow the distance to be optimized, but it also allows some errors to be corrected by 
the corresponding decoder (which is a Viterbi decoder—convolutional encoders, Viterbi decoder, and 
other error-correcting codes are described in Chapter 7). This is the type of solution employed in the 
4D-PAM5 encoder described next. 

We now return to the 4D-PAM5 encoder, which processes four channels instead of two. Therefore, it 
is a 4D symbol constellation with a total of 54=625 points. If all points were employed, the minimum 
Euclidean distance between them would obviously be 1. To avoid that, only one byte of data is encoded at 
a time, so because eight bits require only 256 symbols, these can be properly chosen among the 625 sym- 
bols available to maximize the Euclidean distance (and also propitiate some error correction capability). 

As mentioned above, a trivial solution would be to pick just the even or the odd subsets (because they 
have enough points, that is, 313 and 312, respectively), with a resulting minimum squared Euclidean 
distance of 2. As an example, suppose that the even subset is chosen (symbols with an even number of 
Xs), and that the sequence XXXX-XXYY-XYYX occurs with coordinates (+1, +1, +1, +1), (+1, +1, 0, 0), and 
(+1, 0, 0, +1), respectively. As expected, the corresponding squared distances in this case are XXX X-to- 
XXYY=[(1-1)*+(1-1)?+ (0-1)? + (0-1)7]=2 and XXYY-to-XYYX = [(1-1)? + 0-1)? + (0-0)? + (1-0)*] =2. 

As already mentioned, a better solution is achieved by staying in the full symbol space but using only 
specific paths within it. This signifies that, instead of examining individual symbols, the decoder must 
examine sequences of symbols. Although more complex, the importance of this process is that without 
reducing the minimum squared distance between consecutive symbols (still 2), two additional benefits 
arise. The first regards the fact that such sequences are chosen so that the minimum squared distance 
between them is 4, thus improving noise immunity. The second comes from the fact that because only 
specific sequences are allowed (which the decoder knows), the decoder can choose the one that is more 
likely to represent the transmitted sequence, hence correcting some of the errors that might occur during 
transmission (the decoder picks, among the allowed sequences, that with the lowest Hamming distance to 
the received one, where the Hamming distance is the number of bits in which the sequences disagree). 

The choice of sequences is done as follows. Given that all four channels must be considered at once, 
a total of 16 combinations exist (XXXX, XXXY,..., YYYY). The first step consists of grouping these 16 
cases into eight subgroups (called sublattices) by putting together those that are complements of each 
other, that is, XXXX+YYYY, XXXY+YYYX, etc. The resulting eight sublattices (called DO, D1,..., D7) 
are listed in Figure 6.14(a). This grouping simplifies the encoder and decoder, and it is possible because 
the minimum squared distance between any two symbols within the same sublattice is still 4, and the 
minimum squared distance between two symbols belonging to different sublattices is still 2. 

The table in Figure 6.14(a) also shows the number of points in each sublattice, of which only 64 are 
taken, thus totaling 512 points. In summary, the 256-symbol space was converted into a 512-symbol 
space with the minimum squared distance between any two consecutive symbols equal to 4 if they 
belong to the same sublattice, or 2 if they do not, and between any two sequences equal to 4. 

To choose from 512 symbols, nine bits are needed, so an additional bit (called parity bit) is added to 
the original eight data bits. The parity bit is computed by the convolutional (trellis) encoder mentioned 
earlier and shown in Figure 6.14(b). 
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FIGURE 6.14. (a) The eight sublattices employed in the 4D-PAM5 code; (b) Convolutional (trellis) encoder used 
to generate the parity bit (6), which participates in the 4D-PAM5 encoding procedure; (c) 4D-PAM5 encoder. 
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FIGURE 6.15. Summary of 1000Base-T transmitter operation, which employs the 4D-PAM5 encoder. 


The 4D-PAM5 encoder is presented in Figure 6.14(c). Bits by...b7 are data bits, while bg is the parity 
bit. The splitting is similar to that in Figure 6.13, that is, level-1 separates the symbols into even and 
odd sets (with minimum squared distance equal to 2), while level-2 further splits the space into the 
eight sublattices listed in Figure 6.14(a) (with minimum squared distance equal to 4). The operation 
occurs in the full symbol space with the splitting used only to construct the sequences. As shown 
in Figure 6.14(c), bg selects the family (even or odd), while b7b, select one of the four sublattices in 
that family, and finally b,b,b3b,b, by select one of the 64 points within that sublattice. Note that the 
convolutional encoder (Figure 6.14(b)), which contains 3 D-type flip-flops (DFFs) plus two modulo-2 
adders (XOR gates) uses bits b7b, to calculate bg. (This type of encoder, along with the respective Viterbi 
decoder, will be studied in Chapter 7.) 

The description presented above is summarized in Figure 6.15 where another view of the 4D-PAM5 
encoder used in the Ethernet 1000Base-T interface is shown. For every eight information bits (represented 
by b,...b) and produced at a rate of 1 Gbps), the trellis encoder must first create the ninth bit (b,), then 
the nine bits together are employed to select one 4D-PAM5 point. In Figure 6.15, it is assumed that 
(-1, +1, 0, +2) was the point chosen. The corresponding voltage levels are then delivered to the four 
twisted pairs of wires, operating at a rate of 1 Gbps/8 = 125 MBaud. 
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As a final note, there are 625—512=113 symbols left, some of which are used for control (idle, start 


of packet, end of packet, etc.), while others are avoided (like the five “flat” ones, (-2-2-2-2),..., 
(+24+24+2+2)). 


6.10 Exercises 


1. 


UTP cables 


a. It was mentioned that both 100Base-TX and 1000Base-T Ethernet interfaces employ unshielded 
twisted pairs (UTPs) for communication, of categories 5 and 5e, respectively. However, the max- 
imum frequency for both is 100 MHz. Look first for the definition of cable crosstalk, then check in 
the respective UTP’s data sheets what actually differentiates 5e from 5. 


b. Low and high frequencies are attenuated differently as they travel along a UTP. Check in the cor- 


responding data sheets for the attenuation, in dB/100m (decibels per one hundred meters), for cat- 
egories 3 and 5e UTPs for several frequencies. Is the attenuation at 1 MHz the same in both cables? 


Channel distortion 


Why do the corners of a square wave, when received at the other end of a communications chan- 
nel (a twisted pair of wires, for example), look “rounded”? 


Unipolar codes #1 


a. Given the bit sequence "1101001101", draw the corresponding waveforms for all three unipolar 
codes (NRZ, RZ, NRZ-I) shown in Figure 6.5. 


b. For each waveform, calculate the DC (average) voltage. 
c. For each waveform, calculate the density of transitions. 
Unipolar codes #2 


a. Given the bit sequence "00000000", draw the corresponding waveforms for all three unipolar 
codes (NRZ, RZ, NRZ-I) shown in Figure 6.5. 


b. For each waveform, calculate the DC (average) voltage. 
c. For each waveform, calculate the density of transitions. 
Unipolar codes #3 


a. Given the bit sequence "11111111", draw the corresponding waveforms for all three unipolar 
codes (NRZ, RZ, NRZ-I) shown in Figure 6.5. 


b. For each waveform, calculate the DC (average) voltage. 
c. For each waveform, calculate the density of transitions. 


d. Compare the results to those from the previous two exercises. Which bit sequences cause the 
minimum and maximum number of transitions and the minimum and maximum DC levels? 


Polar codes #1 


a. Given the bit sequence "1101001101", draw the corresponding waveforms for all three polar 
codes (NRZ, RZ, NRZ-I) shown in Figure 6.5. 
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10. 


11. 


12. 


b. 


For each waveform, calculate the DC (average) voltage. 


c. For each waveform, calculate the density of transitions. 

Polar codes #2 

a. Given the bit sequence "00000000", draw the corresponding waveforms for all three polar codes 
(NRZ, RZ, NRZ-I) shown in Figure 6.5. 

b. For each waveform, calculate the DC (average) voltage. 

c. For each waveform, calculate the density of transitions. 

Polar codes #3 

a. Given the bit sequence "11111111", draw the corresponding waveforms for all three polar codes 
(NRZ, RZ, NRZ-I) shown in Figure 6.5. 

b. For each waveform, calculate the DC (average) voltage. 

c. For each waveform, calculate the density of transitions. 

d. Compare the results to those from the previous two exercises. Which bit sequences cause the 
minimum and maximum number of transitions and the minimum and maximum DC levels? 

Bipolar codes #1 

a. Given the bit sequence "1101001101", draw the corresponding waveforms for the two bipolar 
codes (NRZ, RZ) shown in Figure 6.5. 

b. For each waveform, calculate the DC (average) voltage. 

c. For each waveform, calculate the density of transitions. 

Bipolar codes #2 

a. Given the bit sequence "00000000", draw the corresponding waveforms for the two bipolar codes 
(NRZ, RZ) shown in Figure 6.5. 

b. For each waveform, calculate the DC (average) voltage. 

c. For each waveform, calculate the density of transitions. 

Bipolar codes #3 

a. Given the bit sequence "11111111", draw the corresponding waveforms for the two bipolar codes 
(NRZ, RZ) shown in Figure 6.5. 

b. For each waveform, calculate the DC (average) voltage. 

c. For each waveform, calculate the density of transitions. 

d. Compare the results to those from the previous two exercises. Which bit sequences cause the 


minimum and maximum number of transitions and the minimum and maximum DC levels? 


Biphase/Manchester codes #1 


Given the bit sequence "1101001101", draw the corresponding encoding waveforms for the two 
Manchester codes illustrated in Figure 6.5. 
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13. 


14. 


15. 


16. 


17. 


b. 


Cc. 


For each waveform, calculate the DC (average) voltage. 


For each waveform, calculate the density of transitions. 


Biphase/Manchester codes #2 


a. 


b. 


Cc. 


Given the bit sequence "00000000", draw the corresponding encoding waveforms for the two 
Manchester codes illustrated in Figure 6.5. 


For each waveform, calculate the DC (average) voltage. 


For each waveform, calculate the density of transitions. 


Biphase/Manchester codes #3 


a. Given the bit sequence "11111111", draw the corresponding encoding waveforms for the two 
Manchester codes illustrated in Figure 6.5. 

b. For each waveform, calculate the DC (average) voltage. 

c. For each waveform, calculate the density of transitions. 

d. Compare the results to those from the previous two exercises. Which bit sequences cause the 
minimum and maximum number of transitions and the minimum and maximum DC levels? 

MLT-3 code 

a. Given the 12-bit sequence "110100110111", draw the corresponding MLT-3 encoded waveform 
(as in Figure 6.5). 

b. Calculate the waveform’s DC (average) voltage. 

c. Write the 12-bit sequence that causes the maximum number of transitions and draw the corre- 
sponding MLT-3 waveform. 

d. If the clock frequency in (c) is 100 MHz (thus 100 Mbps are produced), what is the frequency of 
the main harmonic? 

e. Repeat part (c) for the sequence that produces the minimum number of transitions. 

MLT-5 code 

a. Given the 12-bit sequence "110100110111", draw the corresponding MLT-5 encoded waveform 
(MLI-5 operates with five sequential voltages, represented by...,—2,-1, 0, +1, +2, +1, 0, -1, -2,...). 

b. Calculate the waveform’s DC (average) voltage. 

c. Write the 12-bit sequence that causes the maximum number of transitions and draw the corre- 
sponding MLT-5 waveform. 

d. If the clock frequency in (c) is 100 MHz (thus 100 Mbps are produced), what is the frequency of 
the main harmonic? 

e. Repeat part (c) for the sequence that produces the minimum number of transitions. 


4B/5B code #1 


If the serial bit sequence OFE7,, is applied to the input of a 4B/5B encoder, what is the bit sequence 
that it produces at the output? 
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18. 


19. 


20. 


21. 


22. 


23. 


b. Calculate the percentage of transitions before and after encoding. 
c. Calculate the DC voltage before and after encoding. 
4B/5B code #2 


a. If the serial bit sequence 0000,,¢ (="00...0") is applied to the input of a 4B/5B encoder, what is 
the bit sequence that it produces at the output? 


b. Calculate the percentage of transitions before and after encoding. 
c. Calculate the DC voltage before and after encoding. 
4B/5B code #3 


a. If the serial bit sequence 111,, (="11...1") is applied to the input of a 4B/5B encoder, what is the 
bit sequence that it produces at the output? 


b. Calculate the percentage of transitions before and after encoding. 
c. Calculate the DC voltage before and after encoding. 
8B/10B code #1 


Suppose that the running (accumulated) disparity of an 8B/10B encoder is +2 and that the next 
word to be transmitted is "00000110". Write the 10-bit codeword that will actually be transmitted by 
the encoder. 


8B/10B code #2 
Assume that the present running disparity value of an 8B/10B encoder is +2. 


a. Write the 20-bit sequence that will be transmitted if the encoder receives the bit sequence 
F8FF ,¢. 


b. Calculate the percentage of transitions before and after encoding. 

c. Calculate the DC voltage before and after encoding. 

8B/10B code #3 

Assume that the present running disparity value of an 8B/10B encoder is +2. 


a. Write the 30-bit sequence that will be transmitted if the encoder receives the bit sequence 
000003} ¢. 


b. Calculate the percentage of transitions before and after encoding. 
c. Calculate the DC voltage before and after encoding. 
2D-PAM5 code 


The questions below refer to a communications channel constructed with two twisted pairs of wires, 
employing a 2D-PAM5 (also called PAM5 x 5) encoder-decoder pair at each end operating in full- 
duplex mode (simultaneous communication in both directions). 


a. How many information bits are conveyed by each encoder (channel) symbol? 


b. Ifthe information bits are fed to the encoder at the rate of 200 Mbps, with what symbol rate must 
the encoder operate? In other words, after how many information bits must one 2D symbol be 
transmitted? 
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24. 


25. 


26. 


c. Suppose that the following sequence of symbols must be transmitted: (-2, +2), (0, 0), (+2, -2), 
(0, 0), (-2, +2), (0, 0),.... Draw the corresponding waveform in each pair of wires (use the clock 
period, T, as your time unit). 


4D-PAM5 code #1 


Show that the trellis encoder of Figure 6.14(b), when b7b,="10 10 00 00 00 00 00...", produces bg="0 
110110..." (assuming that the initial state of all flip-flops is zero). 


4D-PAM5 code #2 


In Figures 6.11 and 6.15, it was shown that the rate at which symbols are transmitted in 1000Base-T 
Ethernet is 125 MBaud. It was also mentioned that the category 5e UTP is recommended only for 
frequencies up to 100 MHz and that high-energy components above 30 MHz should be avoided (for 
irradiation purposes). The questions below address these two constraints. 


a. How are high-energy components prevented in 1000Base-T Ethernet? In other words, which 
block in Figure 6.11 randomizes the data sequence such that the spectrum gets spread? 


b. How are frequencies above 100 MHz avoided? In other words, show that in Figure 6.15, even if 
the symbols transmitted in a certain pair of wires are always from the X =(-1, +1) set, the funda- 
mental (first) harmonic still cannot be higher than 62.5 MHz. 


4D-PAM5 code #3 

The questions below refer to the 4D-PAM5 encoder-decoder pair used in Ethernet 1000Base-T. 
a. Explain why each encoder (channel) symbol can convey a total of nine bits. 

b. With what speed (in bps) are the information bits fed to the encoder? 

c. After every how many information bits must the encoder transmit one 4D symbol? 

d. Calculate and explain the resulting symbol rate. 


e. Suppose that the following sequence of symbols is allowed and must be transmitted: (—2, -1, 
+1, +2), (0, +1,-1, 0), (-2,-1, +1, +2), (0, +1, -1, 0),.... Draw the corresponding waveform in each 
pair of wires using the clock period, T, of part (b) above, as the time unit. 


Error-Detecting/Correcting 
Codes 


Objective: As mentioned in the previous chapter, modern digital designs often include means for 
communications between distinct system parts, forming larger, integrated networks. Large specialized 
means for data storage are generally also required. To add protection against errors in such cases (data 
communications and storage), error-detecting/correcting codes are often employed, so a basic knowledge of 
such codes is highly desirable, and that is the purpose of this chapter. This presentation, however, con- 
stitutes only an introduction to the subject, which was designed to give the reader just the indispensable 
background and motivation. The codes described for error detection are SPC (single parity check) and 
CRC (cyclic redundancy check). The codes described for error correction are Hamming, Reed-Solomon, 
Convolutional, Turbo, and LDPC (low density parity check). Data interleaving and Viterbi decoding are also 
included. 


Chapter Contents 


7.1 Codes for Error Detection and Error Correction 
7.2 Single Parity Check (SPC) Codes 

7.3 Cyclic Redundancy Check (CRC) Codes 
7.4 Hamming Codes 

7.5 Reed-Solomon (RS) Codes 

7.6 Interleaving 

7.7 Convolutional Codes 

7.8 Viterbi Decoder 

7.9 Turbo Codes 

7.10 Low Density Parity Check (LDPC) Codes 
7.11 Exercises 


7.1. Codes for Error Detection and Error Correction 


In Chapter 2 we described several codes for representing decimal numbers (sequential binary, Gray, BCD, 
floating-point, etc.) and also codes for representing characters (ASCII and Unicode). In Chapter 6, another 
group of codes was introduced, collectively called line codes, which are used for data transmission and 
storage. A final group of binary codes is introduced in this chapter, collectively called error-detecting/ 
correcting codes. As the name says, they are used for error detection or error correction, in basically the 
same applications as line codes, that is, data transmission and data storage. 

In the case of line codes, we saw in Chapter 6 that the encoder always modifies the data sequence, 
and in some cases (mB/nB codes) it also introduces additional bits in the data stream. The purpose of 
both (data modification and bit inclusion) is to improve some of the data parameters, notably transition 
density (increases) and DC voltage (decreases). 
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Noisy channel 


Encoder Decoder x 


101100... 011101100... 01[0}101100.... 101100... 


FIGURE 7.1. Typical usage of error-correcting codes. 


In the case of error-detecting/correcting codes, extra bits, called redundancy bits or parity bits, are 
always introduced, so the number of bits at the encoder’s output (1) is always higher than at its input 
(k). The original bits might be included in the new data sequence in modified or unmodified form (if the 
latter occurs, the code is said to be systematic). The purpose of such codes, however, is not to improve 
data parameters, but to add protection against errors. 

The use of error-correcting codes is illustrated in Figure 7.1. The original data is x, which is converted 
into y (with more bits) by the encoder. The latter is then transmitted through a noisy channel, so a pos- 
sibly corrupted version of y (called y*) is actually received by the decoder (note that the third bit was 
flipped during the transmission). Thanks to the redundancy bits, the decoder might be able to recon- 
stitute the original data, x. 

The codes that will be seen in this chapter are listed below. 


@ For error detection: 
m Single parity check (SPC) codes 
m Cyclic redundancy check (CRC) codes 


@ For error correction: 
m Hamming codes 
Reed-Solomon (RS) codes 
Convolutional codes and Viterbi decoder 
Turbo codes 
Low density parity check (LDPC) codes 


7.2. Single Parity Check (SPC) Codes 


SPC codes are the simplest error-detecting codes. An extra bit is added to the original codeword so that 
the new codeword always exhibits an even (or odd) number of '1's. Consequently, the code can detect 
one error, though it cannot correct it. In such a case, the receiver can request the sender to retransmit that 
codeword. 

An example of SPC code is shown in Figure 7.2. It is called PS/2 (personal system version 2) and was 
introduced by IBM in 1987. Its main application is for serial data communication between personal com- 
puters and PS/2 devices (keyboard and mouse). 

As can be seen, the code consists of eight data bits (one byte) to which three other bits are added, called 
start, parity, and stop bits. The start bit is always '0' and the stop bit is always '1'. Odd parity is employed, 
so the parity bit must be 'l' when the number of '1's in the 8-bit data vector is even. As mentioned in 
Chapter 2, Scan Code Set 2 is employed to encode the keys of PS/2 keyboards, which consists of one 
or two data bytes when a key is pressed (called make code) and two or three bytes when it is released 
(called break code). In the example of Figure 7.2, the make code for the keyboard key “P” is shown, which 
is "01001101" (=4Dh). Note that the LSB is transmitted first. 
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FIGURE 7.2. PS/2 protocol (11-bit codeword), which is an SPC code. In the example above, the make code for 
the “P” keyboard key is shown. 


To determine whether an error has occurred, the decoder uses a parity-check equation. In the example of 
Figure 7.2, the codeword of interest has nine bits, that is, c= c1C9C3C4CsC¢C7CgCo, Where C, to Cg are the actual 
data bits and cy is the parity bit (the start and stop bits do not take part in the parity computation). If 
odd-parity is used, then the corresponding parity-check equation is 


GDGOGOGAGOGOGOGOG=1, (7.1) 


where @ represents modulo-2 addition (XOR operation). This equation is true only when an odd num- 
ber of '1's exist in c. Consequently, any odd number of errors will be detected, while any even number of 
errors (including zero) will cause the received codeword to be accepted as correct. In summary, this type 
of code can correctly detect up to one error, though it cannot correct it. 


7.3. Cyclic Redundancy Check (CRC) Codes 


CRC is a very popular method for detecting errors during data transmission in computer-based applica- 
tions. It consists of including, at the end of a data packet, a special binary word, normally with 16 or 32 
bits, referred to as CRC value, which is obtained from some calculations made over the whole data block 
to be transmitted. The receiver performs the same calculations over the received data and then compares 
the resulting CRC value against the received one. Even though not all error patterns cause a wrong 
CRC, most do. This code is used, for example, in the IEEE 802.3 Ethernet frame, which can contain over 
12 kbits with a 32-bit CRC added to it (hence a negligible overhead). 
The CRC code is identified by a generator polynomial, g(x). Some common examples are listed below. 


CRC-8 (8 bits): g(x) =x°+.x°+.x+1 (7.2) 

CRC-16 (16 bits): g(x) =x'°+ x? +741 (7.3) 

CRC-CCITT-16 (16 bits): g(x) =x'©+ x!2+.x° +1 (7.4) 

CRC-32 (32 bits): g(x) = x72 + x64 x23 4x72 4 104 yl 4 ly X04 Ba 74 PEE 4 X41 (7.5) 


To calculate the CRC, the data string, d(x), is simply divided by the generator polynomial, g(x), from which 
q(x) (quotient) and r(x) (remainder) result. The remainder (not the quotient) is the CRC value. 
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01100011 = q(x) 
g(x) 100000111/0110001000000000 d(x) 


100000111 
0100011110 
100000111 
STARE 0000110010000 
d(x) = x'¥4 x"? + x? 100000111 
g(x) =x°+xo+x+1 0100101110 
q(x)=x°+xX°+x4+1 100000111 
(X)ax ++I 000101001 r(x) 


FIGURE 7.3. Example of CRC calculation (mod2 polynomial division). 


“1011000110011... 0000000000000000" 


N bits of d(x) n=16 zeros 
(MSB first) (N>>n) 


FIGURE 7.4. Circuit for the CRC-CCITT-16 encoder, whose generator polynomial is g(x) =1+x?+x'*+x'® The 
boxes marked from 1 to 16 are D-type flip-flops connected to form shift registers (Section 4.11). 


If the degree of g(x) is n, then the degree of r(x) can only be smaller than n. In practice, n—1 is always 
used, with the MSBs filled with ‘0's when the actual degree is smaller than n—1. After the whole data 
vector has passed through the CRC calculator, a string with 1'0's must be entered to complete the calcula- 
tions. This is illustrated in the example below. 

To make the example simple, suppose that d(x) contains only 8 bits, "01100010", and that CRC-8 is chosen 
(in practice, the degree of d(x) is much higher than that of g(x), even by several orders of magnitude). 
After nding n=8'0's on the right of Ra the polynomial becomes d(x)="01100010 00000000" = 
xl44x154x?. The division of ae) by g9(x)=x8+2x7+x+1 is shown in Figure 7.3, from which q(x)=x°+x°+ 
x+1="01100011" and r(x) =x°+2°+1="00101001" result. Hence CRC =1(x) ="00101001". 

Even though this operation might look difficult to implement, the physical circuit is very simple. 
As illustrated in Figure 7.4, which shows the circuit for the CRC-CCITT-16 encoder, it requires only an 
n-stage shift register (Section 4.11) plus a few XOR gates (Section 4.5). Because in this case the genera- 
tor polynomial is g(x)=1+2x°+x!*+x!®, the XOR gates are located at the outputs of flip-flops 5, 12, and 
16. Note that the MSB of d(x) is entered first, and that n (=16 in this example) zeros must be included 
at the end of the data string. After the last zero has been entered, the CRC value will be stored in the 
16 flip-flops, with the MSB in flip-flop 16. As indicated in the figure, N>>n (recall the 802.3 Ethernet 
frame mentioned above, whose N can be as high as ~12 kbits, while 1 is just 32 bits). 


7.4 Hamming Codes 


The two codes described above are only for error detection. Hamming codes, like all the others that 
follow, allow error correction, so they are more complex than those above. 

Hamming codes [Hamming50] are among the simplest error-correcting codes. The encoded codewords 
differ in at least three bit positions with respect to each other, so the code is said to exhibit a minimum 
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Hamming distance d,,,,,=3. Consequently, if during the transmission of a codeword the communication 
channel introduces one error (that is, flips the value of one bit), the receiver will be able to unequivocally 
correct it. 

Error-correcting codes are normally represented using the pair (n, k), where k is the number of informa- 
tion bits (that is, the size of the original information word, u=w,U,...u,, that enters the encoder) and n is 
the number of encoded bits (that is, the size of the codeword c=c,c,...c,, that leaves the encoder). There- 
fore, m=n-—k is the number of redundancy (or parity) bits. The ratio r=k/n is called the rate of the code. 

For any integer m>1,a Hamming code exists, with n=2”—1,k=2"—m-—1,r=k/n, and a total of M= a 
codewords. Some examples are listed below (all with d,,;,,=3). 


For m=3: n=7,k=4,r=4/7, M=16 > (7, 4) Hamming code 
For m=4: n=15, k=11, r=11/15, M=2048 — (15, 11) Hamming code 
For m=5: n=31, k= 26, r=26/31, M=67,108,864 — (31, 26) Hamming code 


The actual implementation will fall in one of the following two categories: nonsystematic (the parity 
bits are mixed with the information bits) or systematic (the parity bits are separated from the informa- 
tion bits). An example of the latter is shown in the table of Figure 7.5(a), with the original words (called 
u, with k=4 bits) shown on the left and the encoded words (called c, with n=7 bits) on the right. Note 
that this code is systematic, because C,CyC3C4=U4UpU3U4. Note also that d=3 between any two Hamming 
codewords, and that the original words include the whole set of k-bit sequences. 

Because the code of Figure 7.5 must add m=3 parity bits to each k-bit input word, three parity-check 
equations are needed, which, in the present example, are: 


(7.6a) 
(7.6b) 
(7.6c) 


Original Hamming-encoded 

codewords codewords A Im 

Uy Up Us Us Cy C2 C3 Cy C5 Ce C7 Sa. 
ie ee 
-|0 111010 
1101001 


0101 100 
0110 001 kk A' 


o-0o0090o 


1101 001 (c) 


LAA Te 


FIGURE 7.5. (a) Input and output codewords of a cyclic-systematic (7, 4) Hamming encoder; Corresponding 
(b) parity check and (c) generator matrices. 
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Above, @ represents again modulo-2 addition (or XOR operation, which produces output='0' when 
the number of ones at the input is even, or '1' when it is odd). 
The parity-check equations above can be rewritten as: 


GDGOHGOG=0 (7.7a) 
GOGOHUHE=0 (7.7b) 
GOGOGHG=0 (7.7¢) 


From these equations, an equivalent representation, using a matrix, results. It is called parity-check 
matrix (H), and it contains m=3 lines (one for each parity-check equation) and n=7 columns. Figure 7.5(b) 
shows H for the code described by the equations above (whose codewords are those in Figure 7.5(a)). 
Note, for example, that row 1 of H has ones in positions 1, 2,3, and 5, because those are the nonzero coef- 
ficients in the first of the parity-check equations above. 

Because the code in Figure 7.5 is systematic, H can be divided into two portions: the left portion, called 
A, corresponds to the original (information) bits and shows how those bits participate in the parity-check 
equations; the right portion, called I,,, corresponds to the redundancy bits and is simply an identity 
matrix of size m. Therefore, H can be written as H=[AI,,]. (Note: Any given matrix H can be converted 
into the format shown in Figure 7.5(b) by applying Gauss-Jordan transformation; however, this normally 
includes column combinations/permutations, so an equivalent but different set of codewords will be 
generated.) 

The major use of H is for decoding because it contains the parity-check equations. Because each row of 
H is one of these equations, if c is a valid codeword then the following results (where c' is the transpose 
of vector c): 


Hc =0 (7.8) 


Still another representation for a linear code is by means of a generator matrix (G). As illustrated in 
Figure 7.5(c), in the case of systematic codes it is constructed from H using A’ (the transpose of A) and 
an identity matrix of size k (I,), that is, G=[I,A‘]. 

While H is used for decoding the codewords, G is used for generating them. As in any linear code, the 
codewords are obtained by linear combinations among the rows of G (that is, row1, row2,...,row1+row2, 
..., row] +row2+row3+...). Because there are k rows in G, and the original words (u) are k bits long and 
include all k-bit sequences, the direct multiplication of u by G produces a valid codeword (thus the name 
for G), that is: 


c=uG (7.9) 


In summary, G is used for generating the codewords (because c=uG), while H is used for decoding 
them (because Hc'=0). Moreover, if the code is systematic, then G=[I,A‘] and H=[AI,,], where A is the 
m-by-k matrix containing the coefficients of the parity-check equations and I, and I, are identity matrices 
of sizes k and m, respectively. 

The Hamming encoding-decoding procedure is illustrated in Figure 7.6, which shows the encoder on 
the left and the decoder on the right, interconnected by some type of (noisy) communications channel. 
The encoder receives the information word, u, and converts it into a Hamming codeword, c, using G 
(that is, c=uG). The decoder receives c*, which is a possibly corrupted version of c. It first computes 
the syndrome, s, using H (that is, s =He*"), If s="00...0," then no error has occurred, and c**=c* is sent 
out, from which u is retrieved. Otherwise, if exactly one error has occurred, s will be equal to one of the 
columns of H. Suppose that it is the ith column, then the error is in the ith bit of c*, which must then 
be reversed. 
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Hamming encoder Hamming decoder 


Syndrome 
algorithm 


FIGURE 7.6. Hamming encoding-decoding procedure. The encoder converts u (information codeword) into c 
(Hamming codeword) using G. A possibly corrupted version of c (c*) is received by the decoder, which computes 
the syndrome, s=Hc*', based on which the algorithm constructs c** (the decoded version of c), from which u 
(hopefully without errors) is recovered. 


c** 


(u) 


As an example, consider that u="0111", so c="0111010", and suppose that the channel corrupts the 
fourth bit of c (from the left), so c*="0110010" is received. The computed syndrome is s="011", which 
coincides with the fourth column of H (Figure 7.5(b)), indicating that the fourth bit of c* must be corrected, 
thus resulting c**="0111010". Taking the first four bits of c**, u="0111" results. 

Just as a final note, Hamming codes can also be cyclic (besides being systematic), meaning that any 
circular shift of a codeword results in another codeword (check, for instance, the code of Figure 7.5). 
The advantage in this case is that the encoder and decoder can be implemented using shift registers 
(Section 4.11), which are simple and fast circuits. 


7.5 Reed-Solomon (RS) Codes 


The Reed-Solomon code [Reed60] is a powerful burst error-correcting code largely used in storage media 
(CDs and DVDs) and wireless communications systems. Its operation is based on blocks of symbols rather 
than blocks of bits. The notation (n, k) is again used, but now it indicates that the code contains a total of 
n symbols in each block, among which k are information symbols (hence m=n~—k are parity symbols). 
This code can correct 1/2 symbols in each block. If each symbol is composed of b bits, then 2’-1 symbols 
must be included in the block, resulting codewords of length n= gel symbols (that is, b(2°—1) bits). 

A popular example is the (255, 223) RS code, whose symbols are b=8 bits wide (1 byte/symbol). 
It contains 223 bytes of information plus 32 bytes of redundancy, being therefore capable of correcting 
16 symbols in error per block. 

The fact that RS codes can correct a symbol independently from the number of bits that are wrong 
in it makes them appropriate for applications where errors are expected to occur in bursts rather than 
individually. One of the first consumer applications using this code was in music CDs, where errors are 
expected to occur in bursts (due to scratches or stains, for example). The encoding, in this case, is a very 
specialized one, as described below. 


Audio CD encoding 


Figure 7.7(a) illustrates the error-correcting encoding used in conventional audio CDs (commercial 
music CDs), which consists of a dual RS encoding. The primary code is a (32, 28) RS whose symbols are 
8 bits wide (bytes). This is depicted in the top box of Figure 7.7(a), where the dark portion (with 28 B, 
where B means bytes) represents the information bytes, while the clear portion (with 4 B) represents the 
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parity bytes. Each of these 32 B symbols becomes then a single symbol for the secondary code, a (28, 24) 
RS, represented by the whole set of boxes of Figure 7.7(a). In summary, on average, for every three 
information bytes, one parity byte is added, so the code’s rate is 0.75 (note that the dark area is 75% of 
the total area). 

Note that these two RS codes do not follow the rule described earlier, which says that 2”—1 symbols 
must be included in each block, where b is the symbol size. These are actually special versions, called 
shortened RS codes. 

Between the two RS encodings, however, interleaving (explained in the next section—not shown in 
Figure 7.7) is applied to the data with the purpose of spreading possible error bursts. In other words, 
consecutive information bytes are not placed sequentially in the CD, so a scratch or other long-duration 
damage will not affect a long run of consecutive bytes, thus improving the error-correcting capability. 
This combination of dual RS encoding plus interleaving is called CIRC (cross-interleaved Reed-Solomon 
code), which is capable of correcting error bursts of nearly 4k bits (caused by a 2.7-mm-long scratch, for 
example). 

The complete sequence of data manipulations that occur before the audio is finally recorded on a 
CD is illustrated in Figure 7.7(b). Recall that audio signals are sampled at 44.1kHz with 16 bits /sample 
in each channel (stereo system). A data frame is a collection of six samples, that is, (6 samples x 16 bits/ 
sample per channel x 2 channels) = 192 b= 24 B (where b=bit, B=byte). This frame, after CIRC encoding, 
becomes 32 B wide (recall that, for every 3 B of information, 1 B of parity is added). 

Next, a subcoding byte is added to each frame, so the new frame size is 33 B. Each bit of the subcoding 
byte belongs to a different channel, so there are eight channels that are constructed with 98 bits, picked 


6x16x2=192b=24B 


Primary code = (32, 28) RS 


28B 4B 
28B 4B Subcode 
= | 244 246 + 8B 
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FIGURE 7.7. Audio CD recording. (a) Simplified diagram illustrating the CIRC encoding, capable of correcting 
error bursts of nearly 4k bits; (b) Detailed frame construction (CIRC+Subcode + EFM +Sync), which contains 
six 16-bit samples of stereo data (192 information bits out of 588 frame bits); (c) A CD block or sector, which 
contains 98 frames. 
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one from each frame in a 98-frame block (the construction of blocks is explained below). However, only 
two of these eight channels are currently used, called P (for music separation and end of CD) and Q (disk 
contents, playing time, etc.). 

The 33 B frame is then passed through an EFM (eight-to-fourteen modulation—explained below) 
encoder, which converts every 8 bits into 14 bits, then adds three more separation bits, hence resulting in 
an 8-to-17 conversion. Thus the new total is (33 bytes x 8 bits /byte x 17/8 for EFM) =561 bits. 

EFM is used to relax the optical specifications on pits and lands (which can be damaged during han- 
dling). It is simply a conversion table where, in all codewords, two consecutive '1's are separated by at 
least two and at most ten '0's. The actual encoding then used to construct the pits is similar to NRZ-I 
(Sections 6.3-4), where a '1' is represented by a pit to land or land to pit transition, while a '0' corre- 
sponds to no transition. Because two consecutive '0's become three consecutive nontransition time slots 
after NRZ-I encoding, the actual minimum distance between pits is 3 lands (3 clock cycles), which is the 
intended specification. The upper limit in the number of '0's (10) then becomes 11 nontransitions, which 
is still small enough to guarantee proper data recovery in the CD player. Additionally, between two EFM 
codewords a 3-bit space is left, so the code is actually an 8-to-17 converter. 

Finally, a 27-bit synchronization word is added to each frame. This word is unique, not occurring in 
any EFM codeword or overlapping codewords, so it is used to indicate the beginning of a frame. This 
completes the frame assembling with a total of 588 bits. In summary, 192 information bits are converted 
into 588 disk (or channel) bits, called a frame, which is the smallest CD entity. The actual information rate 
in an audio CD is then just 33%. 

As shown in Figure 7.7(c), the frames (with 588 bits each, of which 192 are information bits) are 
organized into sectors (or blocks) of 98 frames. Therefore, given that the playing speed of an audio CD 
is 75 sectors/second, the following playing information rate results: 75 sectors/second x98 frames/ 
sector x 192 bits/ frame = 1,411,200 bps. 

To verify whether the speed above is the appropriate speed, we can compare the information rate 
with the sampling rate (they must obviously match), which is the following: 44,100 samples/second x 16 
bits /sample per channel x 2 channels = 1,411,200 bps. 

A final observation is that, besides being able to correct error bursts of nearly 4k consecutive bits, 
the CD player is also able to conceal error bursts of up to ~13 kbits. This, however, has nothing to do 
with the encoding schemes; it is rather achieved by interpolation with the missing (unrecoverable) 
values replaced with the average of their neighbors, thus introducing errors, but very unlikely to be 
perceived by the human ear. (Some other aspects regarding CD encoding will be seen in the exercises 
section.) 


7.6 | Interleaving 


Interleaving is not an error-correcting code; it is just a permutation procedure whose main goal is to 
separate neighboring bits or blocks of bits. This process is essential for the performance of certain codes, 
like RS and turbo, so interleaving is not used alone but as an auxiliary component to other codes (as in 
the CD example above). 

In general, interleaving uses a simple deterministic reordering procedure, like that illustrated in the 
example below, which is employed to break up error bursts. 

Consider an encoder that produces 4-bit codewords, which are transmitted over a communica- 
tions channel to a corresponding decoder that is capable of correcting one error per codeword. This is 
illustrated in Figure 7.8(a), where a sequence of four 4-bit codewords is shown. Say that a 3-bit error 
burst occurs during transmission, corrupting the last three bits of codeword c (that is, cyc3c,). Due to 
the decoder’s limitation, this codeword cannot be corrected. Consider now the situation depicted in 
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@;a2a3a, b;b2b3b, C;C2C3Cy d;d2d3d4 @;a2a3a, by b2b3b4 C;C2C3C4 d;d2d3da4 
transmit | interleave | 
@) 23a, by babsb4 C; C2C3C, di d2d3 dq €;b;C;d; agb2C2d2 a3b3C3d3 agb4Cs dy 
decode | transmit | 
€@;@2a3a, (OK) a;b;¢;d; azbsC2d2 a3 b303d3 agbsC4 dy 
Bi baviePa {O89 de-interleave | 
C; C2C3C4 (wrong) 
didzdsd« (OK) @; 2 a3 a, by b2b3 by CiC2C3 Cy dy d2d3 dy 
(a) decode | 


@;a2a3a4 (OK) 
b;b2b3b,4 (OK) 
C; C2C3Cg (OK) 
(b) d;d2d3d, (OK) 


FIGURE 7.8. Interleaving used for spreading error bursts. (a) System without interleaving; (b) System with 
interleaving. 


Read columns ——> 


FIGURE 7.9. (a) Block and (b) pseudo-random interleavers. 


Figure 7.8(b), where the same codewords are transmitted and the same error burst occurs. However, 
before transmission, interleaving is applied to the codewords. The consequence of this procedure is that 
the errors are spread by the de-interleaver, resulting in just one error per codeword, which can then be 
properly corrected by the decoder. 

Two other types of interleavers are depicted in Figure 7.9. In Figure 7.9(a), a block interleaver is shown, 
consisting of a two-dimensional memory array to which data are written vertically but is read from 
horizontally. In Figure 7.9(b), a pseudo-random interleaver is presented, which consists of rearranging 
the bits in a pseudo-random order. Note that the bits are not modified in either case, but they are just 
repositioned. 

Data interleaving is sometimes confused with data scrambling. The first difference between them is that 
the latter is a randomization procedure (performed pseudo-randomly), while the former can be determin- 
istic (as in Figure 7.8). The second fundamental difference is that interleaving does not modify the data 
contents (it only modifies the bit positions), while scramblers do modify the data (a different number of 
ones and zeros normally results). (Scramblers will be studied in Section 14.8.) 
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7.7 Convolutional Codes 


Contrary to the Hamming and RS codes seen above, which operate with blocks of bits or symbols, con- 
volutional codes operate over a serial bit stream, processing one bit (or a small group of bits) at a time. 

An (n, k, K) convolutional code converts k information bits into n channel bits, where, as before, 
m=n-—k is the number of redundancy (parity) bits. In most applications, k=1 and n=2 are employed 
(that is, for every bit that enters the encoder, two bits are produced at the output), so the code rate 
is r=1/2. The third parameter, K, is called constraint length because it specifies the length of the con- 
volved vectors (that is, the number of bits from x, the input stream, that are used in the computations). 
Consequently, in spite of the encoding being serial, it depends on K—1 past values of x, which are kept 
in memory. 

The name “convolutional” comes from the fact that the encoder computes the convolution between its 
impulse response (generator polynomial) and the input bit stream (see Equation 7.10 below). This can be 
observed with the help of the general architecture for convolutional encoders depicted in Figure 7.10(a). 
The upper part shows a shift register (Section 4.11), whose input is x, in which the last K values of x are 
stored. The encoder’s output is y. Note the existence of a switch at the output, which assigns n values to 
y for every k (= 1 in this case) bits of x, resulting in a code with rate k/n. The K past values of x (stored in 


(b) 


FIGURE 7.10. (a) General architecture for (m, k=1, K) convolutional encoders showing a K-stage shift register 
at the top, K encoder coefficients (hy= ‘O' or '1') per row, n rows, an array of modulo-2 adders (XOR gates), and 
an output switch that assigns rn values to y for every bit of x (k=1 in this case); (b) A popular implementation 
with k=1, n=2, and K=7. 
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the shift register) are multiplied by the encoder’s coefficients, h;, which are either '0' or '1', and are then 


added together to produce y; (i=1,..., 1), that is: 


ij’ 


K 
yD hy X_j (7.10) 
j=l 


A popular implementation, with k=1, n=2, and K=7 (used by NASA in the Voyager spacecraft) is 
shown in Figure 7.10(b). Note again that K (constraint length) is the number of bits from x that partici- 
pate in the computations of y (the larger K, the more errors the code can correct). The coefficients in this 
example are fy = (hy, Myz,..., yz )=(1, 1, 1, 1, 0,01) and hy = (hg, Ngy,.- -, Moy )= (1, 0, 1, 1, 0, 1 1). Therefore: 


Yq = Nyy X_y + IygX_p +... +hyX_7 =X_p +X +X _gtXgt+X_7 (7.11) 
Yo = yy X_4 + hyyX_p +... +hygX_q =X_p4+X_gtX_gtX_gt+X7 (7.12) 


Note that the leftmost stage in both diagrams of Figure 7.10 can be removed, resulting in a shift regis- 
ter with K—1 flip-flops instead of K. In this case, the inputs to the adders must come from (X, X_1, X_2,.-+, 
X_(x-1)), that is, the current value of x must participate in the computations. This produces exactly the 
same results as the circuit with K stages, just one clock cycle earlier. However, it is unusual in synchro- 
nous systems to store only part of a vector, so the option with K stages might be preferred. 

Both options (with K—-1 and K flip-flops) are depicted in Figure 7.11, which shows the smallest 
convolutional code of interest, with k=1,n=2, K=3, h,=(1,1,1),and h)=(1, 0,1). As an example, suppose 
that the sequence "101000..." is presented to this encoder (from left to right). The resulting output is then 
y="11 1000 10 11 00...." 

Any convolutional encoder can be represented by an FSM (finite state machine—Chapter 15), so the 
state transition diagram for the encoder of Figure 7.11(a) was included in Figure 7.11(c) (the machine has 
2-14 states, called A, B, C, and D). In this case, the current value of x must be treated as an input to 
the FSM instead of simply a state-control bit, causing each state to exhibit two possible values for y; for 
example, when in state A, the output can be y="00" (if x='0') or y="11" (if x='1)). 


FIGURE 7.11. (a), (6) Convolutional encoder with k=1, n=2, and K=3 implemented with two and three 
flip-flops, respectively; (c) state transition diagram for the FSM in (a). 
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The FSM model can be used to determine the number of bits that the code can correct, given by 
L(diee-1)/21, where d,,.. is the code’s minimum free distance, that is, the minimum Hamming distance 
between the encoded codewords. This parameter is determined by calculating the Hamming weight 
(w=number of '1's) in y for all paths departing from A and returning to A, without repetition. Due to the 
reason mentioned above (dual output values), examining all paths in the diagram of Figure 7.11(c) is not 
simple, so either the extended version of the FSM diagram (Figure 7.11(b)) is used (so all outputs have a 
fixed value) or a trellis diagram is employed. 

The latter is shown in Figure 7.12(a). Inside the circles are the state values (contents of the K-1=2 
shift register stages), and outside are the values of y (=y,y) in each branch. Note that each state has 
(as mentioned above) two possible output values represented by two lines that depart from each state; 
when x='0', the upper line (transition) occurs, while for x='1' the lower transition happens. All paths 
from A to A, without repetition, are marked with thicker lines in Figure 7.12(b), where dashed lines 
indicate dead ends (repetitions). Note that “repetition” means a transition that has already occurred, 
like from node B to node C (BC), not the repetition of a node (because each node allows two paths 
or two pairs of values for y). The shortest path in Figure 7.12(b) is also the one with the smallest w; 
the path is A>B—>C-—A, which produces y="11 10 11", so w=5 (=d;,,,). Therefore, this code can 
correct | (5-1) /2]=2 errors. However, for the code to exhibit this free distance, a minimum run of bits is 
needed in x; for example, this minimum is 2K —1 bits when the last K—1 bits are a tail of zeros. 


FIGURE 7.12. General trellis diagram for the K=3 convolutional encoder of Figure 7.11(a), constructed with 
K-1=2 flip-flops in the shift register; (b) Single-pass paths departing from A and returning to A (dashed lines 
indicate a dead end—a repetition). 
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longest 


FIGURE 7.13. (a) State transition diagram for the expanded version (with 2“=8 states) of the FSM (Figure 
7.11(b)), which causes the output (y=y,y2) in each state to be fixed; (b) Corresponding trellis diagram with 
the shortest path (from A to A), plus one of the longest paths, and also a dead path (due to the repetition of 
state D) highlighted. 


As mentioned above, the other option to examine the paths from A to A is to use the expanded version 
(Figure 7.11(b)) of the FSM. This case is depicted in Figure 7.13(a), and the corresponding trellis is in 
Figure 7.13(b). The system starts in A, with all flip-flops reset, progressing to B at the next clock transi- 
tion if x='1' or remaining in A otherwise, with a similar reasoning applicable to the other states. Note 
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that the output (y) in each state is now fixed, so following the machine paths is easier. Starting from A, 
and returning to A without repeating any node (now we look simply at nodes instead of transitions 
between nodes), we find that the following seven paths exist (the corresponding Hamming weights are 
also listed): 


Path 1: ABCEA (w=5) 
Path 2: ABDGEA (w=6) 
Path 3: ABDHGEA (w=7) 
Path 4: ABDGFCEA (w=7) 
Path 5: ABCFDGEA (w=7) 
Path 6: ABDHGECEA (w=8) 
Path 7: ABCFDHGEA (w=8) 


The shortest path, one of the longest paths, and also a dead path are marked with thicker lines in the 
trellis of Figure 7.13(b). Note that, as expected, the smallest w is still 5. In this version of the FSM the 
longest path (without repetition) cannot have more than eight transitions because there are only eight 
states, so w=8. Likewise, because there are K=3 flip-flops in the shift register, at least K+ 1=4 transi- 
tions (clock cycles) are needed for a bit to completely traverse the shift register, so the shortest path 
cannot have less than K+1=4 transitions (note that path 1 above has only four transitions, while paths 
6-7 have eight). 

Even though convolutional encoders are extremely simple circuits, the corresponding decoders are 
not. The two main types of decoders are called Viterbi decoder | Viterbi67] and sequential decoder (of which 
the Fano algorithm [Fano63] is the most common). In the former, the size grows exponentially with K 
(which is consequently limited to ~10), but the decoding time is fixed. The latter is more appropriate 
for large Ks, but the decoding time is variable, causing large signal latencies, so the Viterbi algorithm is 
generally preferred. Moreover, in some applications small convolutional codes have been concatenated 
with Reed-Solomon codes to provide very small error rates. In such cases, the decoder is referred to as a 
concatenated RSV (Reed-Solomon-Viterbi) decoder. 


7.8 Viterbi Decoder 


As mentioned above, the Viterbi decoder is the most common decoder for convolutional codes. 
It is a maximum-likelihood algorithm (the codeword with the smallest Hamming distance to the 
received word is taken as the correct one), and it can be easily explained using the trellis diagram 
of Figure 7.12. 


Convolutional Noisy channel * Viterbi + 
x encoder Y Seepoeetaowcescsreerscnssecetes y decoder y™ (> x) 


0110000. 00 110101110000... 0011 00] 00] 11 00 00... 00 110101 110000... 


FIGURE 7.14. Example of sequence processed by the K=3 convolutional encoder of Figure 7.11(a) with two 
errors introduced in the communications channel. 
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Suppose that the K=3 encoder of Figure 7.11(a) is employed and receives the bit stream x ="0110000..." 
(starting from the left) depicted in Figure 7.14. Using the trellis of Figure 7.12(a), we verify that the 
sequence y="00 11 01 01 00 11 00 00..." is produced by the encoder. However, as shown in Figure 7.14, 
let us assume that two errors are introduced by the channel, causing y*="00 11 00 00 11 00 00..." to be 
actually received by the decoder. 

The decoding procedure is illustrated in Figure 7.15 where hard decision is employed (hard decision is 
called so because it is based on a single decision bit, that is, zero or one; Viterbi decoders can also operate 
with soft decision, which is based on integers rather than on a zero-or-one value, leading to superior 
performance). 

The encoder of Figure 7.15 was assumed to be initially in state A. When the first pair of values 
(y*="00") is received (Figure 7.15(a)), the Hamming distance (which constitutes the code’s metric) 
between it and the only two branches of the trellis allowed so far is calculated and accumulated. 
Then the next pair of values (y*="11") is received (Figure 7.15(b)), and a similar calculation is devel- 
oped with the new accumulated metric now including all four trellis nodes. The actual decisions will 
begin in the next iteration. When the next pair of values (y*="00") is presented (Figure 7.15(c)), each 
node is reached by two paths. Adding the accumulated metrics allows each node to decide which 
of the two paths to keep (the path with the smallest metric is chosen). This procedure is repeated 
for the whole sequence, finally reaching the situation depicted in Figure 7.15(g), where the last pair 
of values is presented. Note the accumulated metric in the four possible paths (one for each node). 
Because A is the node with the smallest metric, the path leading to A is taken as the “correct” code- 
word. Tracing back (see the thick line in Figure 7.15(h)), y**="00 11 01 01 11 00 00..." results, which 
coincides with the transmitted codeword (y), so the actual x can be recovered without errors in spite 
of the two errors introduced by the channel (recall that the K=3 code can correct two errors even if 
they are close to each other, or it can correct various groups of up to two errors if the groups are far 
enough apart). 

To recover the original sequence during traceback, all that is needed is to observe whether the upper 
or lower branch was taken in each transition. The first traceback step in Figure 7.15(h) is from A to A; note 
that, when going forward, from node A the encoder can only move to node A (if input ='0') or B (if '1'), 
so the first traceback bit is '0'. The second traceback step is also from A to A, so the second bit is also 
'0'. The third traceback step is from A to C; because node C is connected to nodes A (for input ='0') and 
B (for '1'), the third bit is again '0'. Following this reasoning for all seven traceback steps, the sequence 
"0110000" results (where the first traced-back bit is the rightmost), which coincides with the original 
sequence. 

A fundamental issue regards the depth (L) of the decoder. The more the decoder iterates before 
deciding to trace back to find the (most likely) encoded sequence, the better. It was shown [Viterbi67], 
however, that little gain occurs after ~5K iterations. For the K=3 encoder, this means that L ~ 15 iterations 
should occur before any output is actually made available. Likewise, for the K=7 code of Figure 7.10(b), 
about 35 iterations are required. Therefore, L determines the amount of delay between the received and 
the decoded codewords, as well as the amount of memory needed to store the trellis states. These two 
parameters (delay and memory) are therefore the ultimate deciding factors in the choice of K in an actual 
application. 

Another observation regards the traceback procedure. If the sequence is simply truncated, then 
the starting node for traceback is the output node with the smallest metric (node A in the example 
of Figure 7.15(h)). However, determining the node with the smallest metric requires additional 
computation effort, which can be avoided with the inclusion of a tail of K—1 zeros at the end of 
the bit sequence, which forces the final node to always be A, so traceback can always start from it 
without any previous calculation. 
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(2) 


(3) 
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(2) 
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(3) 


FIGURE 7.15. Viterbi decoding procedure for the sequence shown in Figure 7.14 (with two errors). The last 
trellis shows the traceback, which produces the corrected codeword. 
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7.9 Turbo Codes 


A new class of convolutional codes, called turbo codes [Berrou93], was introduced in 1993. Because it 
achieves even better performance than the concatenated RSV code mentioned earlier, intense research 
followed its publication. 

Before we present its architecture, let us observe the convolutional encoder of Figure 7.16. This circuit 
is equivalent to that in Figure 7.11(a), also with k=1, n=2, and K=3 (note that the option with K-1=2 
flip-flops in the shift register was employed, which does not alter the encoding). The difference between 
these two circuits is that the new one has a recursive input (that is, one of the outputs is fed back to the 
input); moreover, the no longer existent output was replaced with the input itself (so now y,=x; note 
that the other output, ,, is still computed in the same way). The resulting encoder is then systematic (the 
information bits are separated from the parity bits). Because it is recursive, systematic, and convolutional, 
it is called RSC encoder. As expected, the output sequence generated by this encoder is different from 
that produced by the equivalent encoder of Figure 7.11(a). In general, the RSC option tends to produce 
codewords with higher Hamming weights, leading to a slightly superior performance in low signal-to- 
noise ratio (SNR) channels. 

Now we turn to the architecture of turbo codes. As shown in Figure 7.17, it consists of two (nor- 
mally identical) RSCs connected in parallel. Note that there is also an interleaver at the input of the 
second encoder, so it receives a different bit sequence. Note also that each individual encoder has only 
one output (y, in encoder 1, y, in encoder 2) because the other output is the input itself. If the output 
sequence is ... XY {Yo XY {Yo XY Yo--., then the overall code rate is r=1/3. 


FIGURE 7.16. Recursive systematic convolutional (RSC) encoder with k=1, n=2, and K=3, equivalent to the 
nonrecursive convolutional (NRC) encoder of Figure 7.11(a) (the case with K—1=2 flip-flops in the shift register 
was used here). Note that y, is the input itself (systematic encoder), while y, is still computed in the same way. 


Turbo encoder 


Interleaver 


RSC 
encoder 2 


FIGURE 7.17. Turbo encoder, which consists of two RSC encoders connected in parallel (but separated by an 
interleaver). 
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A major advantage of this encoding scheme is that the decoder is constructed with two individual 
(separated) decoders, connected in series, plus a de-interleaver, so the overall decoding complexity is 
not much higher than that of convolutional codes alone. These convolutional decoders are implemented 
using soft decision (Bahl algorithm and others), achieving higher performance. In summary, the overall 
complexity is comparable to that of the concatenated RSV (Reed-Solomon-Viterbi) decoder mentioned 
earlier, but the overall error-correcting capability is slightly superior. The main disadvantage is in terms 
from latency. Nevertheless, turbo codes are already used in several applications, like third-generation 
cell phones and satellite communications. Its growth, however, was overshadowed by the rediscovery 
of LDPC codes only a few years later. 


7.10 Low Density Parity Check (LDPC) Codes 


LDPC codes were developed by R. Gallager during his PhD at MIT in the 1960s [Gallager63]. However, 
the discovery of Reed-Solomon codes, with a simpler decoding procedure, followed by concatenated 
RSV codes and eventually turbo codes, caused LDPC codes to remain unnoticed until they were brought 
to attention again in 1996 [MacKay96]. Subsequently, it was demonstrated that LDPC codes can out- 
perform turbo codes (and therefore any other code known so far), being only a fraction of a decibel shy 
of the Shannon limit. For that reason, LDPC codes tend to become the industry standard for high-tech 
applications like digital video broadcasting and next-generation cellular phones. 

Like any other code, it can be represented by the pair (n, k), where n indicates the number of channel 
bits for every k information bits. Therefore, m=n—k is again the number of parity (redundancy) bits. 
Additionally, like any linear code, LDPC codes can be represented by a parity-check matrix, H, whose 
dimension is again m-by-n. However, in the case of LDPC, two additional conditions arise: (1) 1 is large 
(often many thousand bits), and consequently so is m; (2) H contains only a few '1's per row or column. 
The latter condition is referred to as parity-check sparsity and is specified using w,,,, and W,,.,, which rep- 
resent the Hamming weights (number of '1's) of the rows and columns of H, respectively, with w,,.<<n 
and W..)<< mM. 

When w,,,, and w,,.) are constant, the code is said to be regular, while nonconstant values render an 
irregular code. The codes originally proposed by Gallager fall in the former category, but it has been 
demonstrated that very large n irregular codes can achieve superior performance. An example of regular 
code (from [Gallager63]) is shown in Figure 7.18, with n=20, m=15 (so k=5), w,,,,=4, and w,,)=3. 
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FIGURE 7.18. Example of low density parity check matrix, with n=20, m=15 (so k=5), W,ow=4, and W,,)=3. 
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LDPC codes are normally described using a Tanner graph. The construction of this type of graph is 
illustrated in Figure 7.19(b), which implements the parity-check matrix of Figure 7.19(a) (though this is 
not a low density matrix, it is simple enough to describe the construction of Tanner graphs). As usual, 
the dimension of H is m-by-n, so m=4 and n=6 in this example. As can be seen, the graph contains m 
test-nodes (shown at the top) and n bit-nodes (at the bottom). Each test-node represents one parity-check 
equation (that is, one row of H); test-node 1 has connections to bit-nodes 1, 2, and 3 because those are the 
bits equal to '1' in row 1 of H; likewise, test-node 2 is connected to bit-nodes 1, 4, and 5 because those are 
the nonzero bits in row 2 of H, and so on. 

Consequently, because each test-node represents one of the parity-check equations, the set of tests per- 
formed by the test-nodes is equivalent to computing Hc!, which must be zero when a correct codeword 
is received. In other words, the parity-check tests performed by the four test-nodes in Figure 7.19(b) are 
described by the following modulo-2 sums (XOR operations), which are all '0' for a valid codeword: 


Test at test-node 1: t,=c, QO (7.13a) 
Test at test-node 2:  t,=c;OaqOa, (7.13b) 
Test at test-node 3: t3=Q, OG OG (7.13¢) 
Test at test-node 4: ty=c3 OCS OG (7.13d) 


Besides low density, another major requirement for H to represent an LDPC code is the inexistence 
of 4-pass cycles (also called short cycles). This is illustrated with thick lines in Figure 7.20(b). This kind of 
loop can cause the code to get stuck before the received codeword has been properly corrected, or it can 
simply indicate that the received codeword contains errors when it does not. (Note that there is more 
than one 4-pass cycle in Figure 7.20(b).) 


Test-nodes 
(total=m) 
[111000] 
H= AO Ost tag 
O1T097 01 
OO tO 4 Bit-nodes 
a = (total=n) 
(a) (b) Cy C2 C3 Cs Cs Ce 


FIGURE 7.19. (a) Parity-check matrix and (b) corresponding Tanner graph. 


14% 00.0 
| 4-49 OF 1°00 
had ere a ee 

000111 


(a) (b) 


FIGURE 7.20. Example of Tanner graph with 4-pass cycle. 
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There are two main methods for generating LDPC codes, called random designs and combinatorial 
designs. For large n, the random generation of LDPC matrices normally leads to good codes (pseudo- 
random methods are used in which 4-pass cycles are prevented or eliminated afterward). In combina- 
torial processes, more regular geometric structures are created, adequate for small n. The example in 
Figure 7.18 falls in the latter category. 

There are several algorithms for decoding LDPC codes, collectively called message-passing 
algorithms, because they pass messages back and forth in a Tanner graph. These algorithms can be 
divided into two main types, called belief-propagation (with hard or soft decision) and sum-product 
(with soft decision only). 

In belief-propagation, the information is passed either as a 1-bit decision (zero or one, hence called 
hard decision) or as a probability (called soft decision). If the logarithm of the probability is used in the 
latter, then it is called swm-product algorithm. Soft decision algorithms achieve better performance 
than hard decision, but the general idea is simpler to understand using the latter, which is described 
below. 

Consider once again the parity-check matrix and corresponding Tanner graph seen in Figure 7.19. 
When a codeword is received, the test nodes perform the tests ¢, to f, in (Equations 7.13(a)-(d)). If all yield 
'0', then the received codeword is valid and the decoding procedure should end. Otherwise, information 
passing must occur as follows. Test-node 1 receives information (codeword values) from bit-nodes 1, 2, 
and 3, which it uses to compute the information that must be passed back to those same nodes. However, 
the information returned to bit-node 1 (which is a modulo-2 sum, or XOR operation) cannot include the 
information received from that node; in other words, the information passed from test-node 1 to bit-node 1 
is tf}; =C)®c 3. The same reasoning applies to the other nodes, so the complete information set passed from 
the test- to the bit-nodes in Figure 7.19(b) is the following: 


From test-node 1 to bit-nodes 1, 2,3: t,,;=C, Oc; tyy=C, OC; t3=4, OG (7.14a) 
From test-node 2 to bit-nodes 1,4, 5: t),;=Cy@u, tyg=Cy OCs, tyg=Cy Oy (7.14b) 
From test-node 3 to bit-nodes 2, 4, 6:  t32=Cy OCG, tzg= OQ DCG, tag =QDCy (7.14c) 
From test-node 4 to bit-nodes 3, 5,6: ty3=C, O Cg, tygs=C3 OCy, tag =Cg OG, (7.14d) 


Upon receiving this information, the bit-nodes must update the codeword values. This is done using a 
majority function, that is, a codeword bit is flipped if most of the information bits that are received, which 
are assumed to be true (thus the name belief-propagation) are in disagreement with the bit’s current value. 
In summary, the set of computations performed by the bit-nodes is the following: 


By bit-node 1: c,=majority(t,,, ty, C)) (7.15a) 
By bit-node 2: c,=majority(t,, t32, Cy) (7.15b) 
By bit-node 3: c3;=majority(t,3, ty3, C3) (7.15¢) 
By bit-node 4: c,=majority(ty4, t34, C4) (7.15d) 
By bit-node 5: c,=majority(t,5, ty5, Cs) (7.15e) 


By bit-node 6: c,=majority(t3,, tag, Ce) (7.15f) 
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FIGURE 7.21. Belief-propagation decoding procedure for the LDPC code represented by the parity-check 
matrix of Figure 7.19(a). The received codeword is "100100", which is decoded as (corrected to) "110100". 


As an example, suppose that the codeword c="110100" was transmitted (note that this is a valid code- 
word because Hc! =0), and that noise in the communications channel caused an error in its second bit, 
so c'="100100" was received. The decoding procedure is illustrated in Figure 7.21. In Figure 7.21(a), 
the received codeword is presented to the decoder (at the bit-nodes), so ones (solid lines) and zeros 
(dashed lines) are transmitted to the test-nodes. In Figure 7.21(b), the test-nodes calculate t,, t, t3, and ty 
(Equations 7.13(a)—(d)), resulting in "1010". Because not all tests produced a zero, there is at least one 
error in the received codeword, thus the decoding procedure must continue. In Figure 7.21(c), the test 
nodes produce the individual information bits ti; to be sent to the bit-nodes (Equations 7.14(a)-(d)); 
again, ones are represented by solid lines, and zeros are represented by dashed lines. In Figure 7.21(d), 
each bit-node computes the majority function (Equations 7.15(a)-(f)) to update the codeword bits (indi- 
cated by arrows). In Figure 7.21(e), the new codeword values ("110100") are sent back to the test-nodes. 
Finally, in Figure 7.21(f), the test-nodes redo the tests, now producing t, =t,=f;=t,='0', so the decoding 
procedure is concluded and the decoded (corrected) codeword is c="110100", which coincides with the 
transmitted codeword. 


7.11 Exercises 


1. Single parity check code 


Assume that the character string “Hi!”, encoded using the ASCII code, must be transmitted using 
an asynchronous transmission protocol similar to that seen in Figure 7.2, which includes start, stop, 
and parity bits. Draw the corresponding 33-bit waveform. 


2. CRC code #1 
a. Make a sketch (like that in Figure 7.4) for a circuit that implements the CRC-8 encoder. 


b. Using the circuit drawn above, find the CRC-8 value for the string d(x)="01100010", previously calcu- 
lated in Figure 7.3 using regular polynomial division. Compare your result against that in Figure 7.3. 
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- CRC code #2 
Using the CRC-8 circuit drawn in part (a) of the exercise above, find the CRC-8 value for the 14-bit 
character string “Hi”, assuming that it has been encoded using the ASCII code. 
. Error-correcting codes 

a. Error-correcting codes are normally represented by the pair (1, k). What is the meaning of n and k? 

b. What is the meaning of rate of a code? Write its expression as a function of n and k. 

c. Suppose that the minimum Hamming distance between the codewords of a certain code is 
dyin =9- How many errors can it correct and how many errors can it detect? 

d. Repeat part (c) for dyin, =6.- 

. Hamming code #1 

a. How many errors can a Hamming code correct? 

b. If one is willing to add four redundancy bits to each codeword to produce a system capable of 
correcting one error per codeword, what is the maximum length of the final codeword (with the 
m=4 parity bits included)? 

c. For the case in (b), write the (n, k) pair for the corresponding Hamming code. 

d. Still regarding (b), what is the maximum number of information codewords? 

e. Finally, what is the code rate in (b)? 

. Hamming code #2 

Consider the (7, 4) Hamming code defined by the parity-check matrix H and generator matrix G of 

Figure 7.5. 

a. The codewords are seven bits long. How many of these bits are information bits and how many 
are parity bits? 

b. What is the rate of this code? 

c. Why is this particular case called systematic? 

d. Why is this particular case called cyclic? Show some examples to illustrate your answer. 

. Hamming code #3 

Consider again the (7, 4) Hamming code of Figure 7.5. 

a. Ifthe information word "1111" is presented to the encoder, what codeword does it produce at the 


output? Determine the codeword using the generator matrix, G, then compare it to that listed in 
the table. 


Suppose that this codeword is transmitted over a noisy channel and is received by the decoder 
at the other end with the rightmost bit in error. Using the syndrome decoding procedure (which 
employs the parity-check matrix, H), show that the decoder is capable of correcting it. 


Suppose that now the last two bits are received with errors. Decode the codeword and confirm 
that two errors cannot be corrected. 
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8. Hamming code #4 


Consider the Hamming code defined by the parity-check matrix of Figure E7.8 (note that the 
columns of H are organized in increasing order of the corresponding decimal values, that is, H=[1 2 
34567]). 


OQ ak as 
ee ees 


FIGURE E7.8. 


a. What are the parameters k,n, m, and d,,;, of this code? 


b. 
c. 
d. 


e. 


How many errors can it correct? 
Find a generator matrix, G, for this code. 
Is this code systematic? 


Is this code cyclic? 


9. Reed-Solomon code 


Suppose that a regular RS code is designed to operate with blocks whose individual symbols are 
four bits long (so b=4). 


a. What is the total number of symbols (1) in this code? 


e. 


If m=4 symbols in each block are parity (redundancy) symbols, what is the code’s (n, k) 
specification? 


How many symbols can the code above correct? 


Because each symbol contains four bits, does it matter how many of these four bits are wrong 
for the code to be able to correct a symbol? 


Why is it said that RS codes are recommended for the correction of error bursts? 


10. Audio CD #1 


a. What is the playing rate, in information bits per second, of a conventional audio CD? 


b. 
c. 


d. 


What is its playing rate in CIRC-encoded bits per second? 
What is the playing rate in sectors per second? 


Given that the total playing time is 74 minutes, how many sectors are there? 


11. Audio CD #2 


a. Frame is the smallest entity of an audio CD. Explain its construction. 


Sector (or block) is the next CD entity. What does it contain? 


Given that the playing time of an audio CD is around 75 minutes, what is its approximate 
capacity in information bytes? 
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12. Audio CD #3 


Given that a conventional audio CD plays at a constant speed of approximately 1.3m/s, prove the 
following: 


a. That the amount of information bits is ~1.09Mb/m. 

b. That the amount of CIRC-encoded bits is ~1.45Mb/m. 

c. That the amount of disk (or channel) bits is ~3.32 Mb/m. 

d. That a 2.7-mm scratch can corrupt ~3.9k of CIRC-encoded bits. 
13. Interleaving 


Consider the block interleaving shown in Figure E7.13 where data are written in vertically (one row 
at a time) and read out horizontally (one column at a time). The data written to the block are the 
ASCII string “Hello!!” Read it out and check in the ASCII table (Figure 2.11) the resulting (inter- 
leaved) seven-character string. 


read —_> 


write 


——_ 
-|-lo}/—|—|o/=x 


FIGURE E7.13. 


14. Convolutional code #1 
Consider the convolutional encoder with k=1,n=2, and K=7 of Figure 7.10(b). 
a. What is the meaning of these parameters (k, n, K)? 


b. Suppose that the input (x) rate is 100 Mbps. With what frequency must the switch (a multiplexer) 
that connects y, and y, to y operate? 


c. Ifthe input sequence is x="11000...", what is the output sequence? 


d. The diagram shows a shift register with K stages (flip-flops). However, we saw that an equivalent 
encoder results with K—1 stages. Sketch such a circuit. 


15. Convolutional code #2 


A convolutional encoder with K=7 was shown in Figure 7.10(b), and another with K=3 was 
shown in Figure 7.11(a). Draw another convolutional encoder, with K=4 (k=1, n=2), given that 
its coefficients are hy = (hy, My, M43, hy4) = (1, 0, 1, 1) and hy =(fy1, hyo, hy3, Mog) = (1, 1, 0, 1). Make two 
sketches: 


a. For K stages in the shift register. 


b. For K—1 stages in the shift register. 
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16. Convolutional code #3 


17. 


18. 


19. 


Consider the convolutional encoder with k=1, n=2, and K=3 of Figure 7.11(a). 


a. 


Given that the encoder is initially in state A="00" and that the sequence x ="101000..." (starting 
from the left) is presented to it, determine the corresponding (encoded) sequence, y. (Use the 
trellis diagram of Figure 7.12 to easily solve this exercise.) 


Suppose now that the sequence y="11 10 00 10 10 00 00..." was produced and subsequently 
transmitted over a noisy channel and was received at the other end with two errors, that is, 
y*="10 11 00 10 10 00 00...." Using the Viterbi algorithm (as in Figure 7.15), decode the received 
string and check whether the decoder is capable of correcting the errors (as you know, it should, 
because this code’s free distance is 5). 


Turbo codes 


a. Why are turbo codes said to belong to the family of convolutional codes? 


b. Briefly describe the main components of a turbo code. Why are the individual encoders called 
RSC? 

c. The convolutional encoder of Figure 7.10(b) is nonrecursive. Make the modifications needed to 
make it usable in a turbo code. 

d. As we saw, turbo codes are quite new (1993). Even though they could outperform all codes in 
use at that time, why were they overshadowed only a few years later? 

LDPC code #1 


Consider the parity-check matrix presented in Figure 7.18. If only the upper third of it were employed, 
which would the code’s parameters (1, 11, k, W,ow, Woo) be? Would it be a regular or irregular LDPC code? 


LDPC code #2 


Consider a code represented by the parity-check matrix of Figure E7.19 (this is obviously not an actual 
low-density matrix, but it is simple enough to allow the decoding procedure to be performed by hand). 


a. What are this code’s n, m, k, w 


and w,,) parameters? 


TOW’ col 


Draw the corresponding Tanner graph. 
Show that this graph contains a 4-pass cycle. 
Check whether the codewords "00111100" and "00111000" belong to this code. 


oo-- 
e230 = 
-=-O00O0O- 
o--=0 
o--0 
ao-0 
-=-00 


FIGURE E7.19. 
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e. Suppose that the codeword "00111100" is received by the decoder. Decode it using the 
belief-propagation procedure described in Figure 7.21. Compare the result with your answer 
in part (d) above. 


f. Repeat part (e) above for the case when "00111000" is received. Even though this codeword has 
only one error, can this decoder correct it? 
20. LDPC code #3 
Consider a code represented by the parity-check matrix of Figure E7.20. 


a. What are the corresponding values of n, m,k, w,.,,, and W,,)? Is this code regular or irregular? 


TOW’ col* 


b. Draw the corresponding Tanner graph. Does it contain 4-pass cycles? 
c. Check whether the codewords "0011011" and "0011111" belong to this code. 


d. Suppose that the codeword "0011011" is received by the decoder. Decode it using the procedure 
described in Figure 7.21. Compare the result against your answer in part (c) above. 


e. Repeat part (d) above for the case when "0011111" is received. Again, compare the result against 
your answer in part (c). 


a Ue ims Umea: Wi ¢ 2 YR 
WPL 9 8 a a 
H=|}0 100100 
0010010 
900° 4 0.0.4 


FIGURE E7.20. 
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Bipolar Transistor 


Objective: Transistors are the main components of digital circuits. Consequently, to truly understand 
such circuits, it is indispensable to know how transistors are constructed and how they function. Even 
though the bipolar transistor, which was the first type used in digital circuits, is now used in fewer digital 
applications, a study of it facilitates an understanding of the evolution of digital logic and why MOS- 
based circuits are so popular. Moreover, the fastest circuits are still built with bipolar transistors, and 
new construction techniques have improved their speed even further (these new technologies are also 
included in this chapter). It is also important to mention that for analog circuits, the bipolar transistor still 
is a major contender. 


Chapter Contents 


8.1 Semiconductors 

8.2 The Bipolar Junction Transistor 
8.3. I-V Characteristics 

8.4 DC Response 

8.5 Transient Response 

8.6 AC Response 

8.7. Modern BJTs 

8.8 Exercises 


8.1 Semiconductors 


The preferred semiconductor for the fabrication of electronic devices is silicon (Si) because Si-based 
processes are more mature, and they cost less than other semiconductor processes. 

To construct transistors, diodes, or any other devices, the semiconductor must be “doped,” which con- 
sists of introducing controlled amounts of other materials (called dopants) into the original semiconductor. 
In the case of Si, which has four valence electrons, such materials belong either to group III (like B, Al, Ga, 
In) or group V (P, As, Sb) of the periodic table. 

Semiconductor doping is illustrated in Figure 8.1. In Figure 8.1(a), a dopant with five valence elec- 
trons (phosphorus, P) was introduced into the crystalline structure of Si. P forms four covalent bonds 
with neighboring Si atoms, leaving its fifth electron, which has a very small bonding energy, free to wan- 
der around in the structure. As a result, the region surrounding the P atom becomes positively charged 
(a fixed ion), while the electron becomes a free charge. Because P has “donated” an electron to the struc- 
ture, it is said to be a donor-type dopant, and because the free charge is negative, the doped material is 
said to be an n-type semiconductor. 


181 


182 CHAPTER 8 Bipolar Transistor 


The reverse situation is depicted in Figure 8.1(b), where a valence-3 atom (boron, B) was employed, 
creating a fixed negatively charged region around the B atom plus a free hole. Because B “accepts” an 
electron from the structure, it is said to be an acceptor-type dopant, and because the free charge is positive, 
the doped material is said to be a p-type semiconductor. 

Typical doping concentrations are Nj=10 atoms/cm? for donors and N,=10'atoms/cm? for 
acceptors. There are, however, cases when the semiconductor must be heavily doped (to construct wires 
or to create good ohmic contacts, for example), in which concentrations around 108 atoms/cm? are 
employed (Si has a total of ~10”atoms/cm*). Heavily doped regions are identified with a “+” sign after 
n or p (that is, n+ or p+). 

Besides Si, Ge (germanium) and GaAs (gallium arsenide) are also very important semiconductors for 
the construction of electronic devices. A comparison between their electron and hole mobilities (to which 
the device’s maximum operating speed is intimately related) can be observed in Figure 8.2. 

While GaAs has the highest electron mobility, it also exhibits the poorest hole mobility, so for high- 
frequency circuits, only electron-based GaAs devices are useful. Ge, on the other hand, has good 
mobility for electrons and for holes. Even though its intrinsic carrier concentration (Figure 8.2) is 
too high for certain analog applications (like very-low-noise amplifiers), that is not a problem for 
digital circuits. The importance of these materials (GaAs and Ge) resides mainly in the fact that they 
can be combined with Si in modern construction techniques to obtain extremely fast transistors 
(advanced bipolar transistors are described in Section 8.7, and advanced MOS transistors are described 
in Section 10.8). 


(b) 


FIGURE 8.1. (a) Doping of silicon with a donor (phosphorus), resulting in an n-type material; (b) Doping with 
an acceptor (boron), resulting in a p-type material. 


Mobilily (cm?/ V.s) Intrinsic carrier 
Semiconductor concentration @25°C 
Electrons Holes (free pairs per cm’) 


GaAs 8500 400 ~10’ 


FIGURE 8.2. Carrier mobility and intrinsic carrier concentration of main semiconductors. 
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8.2 The Bipolar Junction Transistor 


Transistors play the main role in any digital circuit. The first logic families were constructed with bipolar 
junction transistors (BJTs), while current digital integrated circuits employ almost exclusively MOSFETs. 
The former is described in this chapter, while the latter is discussed in Chapter 9. 

As mentioned in Section 1.1, the bipolar junction transistor, also called bipolar transistor or simply 
transistor, was invented at Bell Laboratories in 1947. As depicted in Figure 8.3(a), it consists of a very thin 
n-type region (called base), sandwiched between a heavily doped p+ region (called emitter) and a lightly 
doped p region (called collector). The n+ and p+ diffusions at the base and collector provide proper ohmic 
contacts. Because of its doping profile, this transistor is called pnp, and it is represented by the symbol 
shown on the left of Figure 8.3(b). The reverse doping profile constitutes the npn transistor, represented 
by the symbol on the right of Figure 8.3(b). Due to its higher speed and higher current gain, the latter is 
generally preferred. 

In digital applications, a BJT operates as a switch that closes between collector and emitter when 
the base-emitter junction is forward biased, which for silicon requires a voltage Vgz~ 0.7 V for npn or 
Vgz~—0.7V for pnp transistors. This behavior is illustrated in Figure 8.4. A simple switching circuit 


pnp transistor npn transistor 


Emitter Base Collector — Metal contact E C 
SiO. 
Si B B 


(b) 


FIGURE 8.3. (a) Cross section of a pnp transistor; (b) pnp and npn transistor symbols. 


Vo =Voc 


Vo=0V 


FIGURE 8.4. (a) Simple switching circuit with npn BJT; (b) Cutoff mode ('0' applied to the input), represented 
by an open switch; (b) Saturation mode ('1' applied to the input), represented by a closed switch. 
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is shown in Figure 8.4(a), which employs an npn transistor, a load resistor (Rc) at the collector, and a 
current-limiting resistor (Rg) at the base. In Figure 8.4(b), '0' (0 V) is applied to the circuit, causing the tran- 
sistor to be cut off, hence [-=0 and, consequently, Vo= Vcc (=5 V='1'). The OFF transistor is represented 
by an open switch on the right of Figure 8.4(b). The reciprocal situation is depicted in Figure 8.4(c), that 
is,a'l' (Vcc) is applied to the input, which causes the transistor to be turned ON (because Vec>0.7V), 
hence allowing I, to flow and, consequently, causing Ic to also exist. The ON transistor is represented by 
a closed switch on the right of Figure 8.4(c). (The actual value of Vo when the transistor is saturated is 
not exactly 0 V, but around 0.1 V to 0.2 V for low-power transistors.) 


8.3 I-V Characteristics 


Figure 8.5 shows typical I-V characteristics for a BJT. The plots relate the collector signals (Ic x Veg), with 
each curve measured for a fixed value of I,. The plot contains two regions that are called active and 
saturation. Linear (analog) circuits operate in the former, while switching (digital) circuits operate either 
in the latter or cut off. 

To be in the active or in the saturation region, the transistor must first be turned ON, which requires 
Vpp~ 0.7 V (for a silicon npn transistor). If I. is not too large, such that Vcg = Vpp, then the transistor oper- 
ates in the active region, where the relationship between the base current, Ip, and the collector current, Ic, is 
very simple, given by I.=flp, where B is the transistor's current gain. Observe that 6 = 150 in Figure 8.5 (just 
divide I. by [pin the active region). On the other hand, if Ic is large enough to cause Vog< Vez, then both 
transistor junctions (base-emitter and base-collector) are forward biased, and the transistor operates in 
the saturation region (note the dividing, dashed curve corresponding to Vcg=V pg in the figure), in which 
I. =I no longer holds. In it, I-< Blz because I; is prevented from growing any further, but I, is not. 

The transistor behavior described above can be summarized by the following equations (where the 
junction voltage, Vi, is ~0.7 V for Si or ~0.2 V for Ge): 


Cutoff region: 


Vor<Vi>i=0 (8.1) 


Ic + 
(mA) i Saturation region 
12 t Ig = BOA mn} 
<—_ Saturation curve (Vce=Vee) 
9 Ip =60nA 
Active region 

6 Ip = 40nA 

3 Ip = 201A 

o - 
0 1 : 3 a Vee (V) 


Vee sat=0.2V 


FIGURE 8.5. Typical /-V characteristics of an npn bipolar transistor. 
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Active region: 

Vee Vy Vee = Vee c= Ble (8.2) 
Saturation region: 

Vege ~ Vy Vee < Vee he <Ble (8.3) 


As can be observed in Figure 8.5, the active region is characterized by the fact that (ideally) I, is not 
affected by Vg, that is, if I, stays constant, then I, stays constant too ([-= lg) for any Vg. 

In the saturation region, however, Vc, does affect Ic. Moreover, for I. >0, Vcg cannot decrease below 
~0.1V to 0.2 V (for low-power transistors; higher values are exhibited by high-power BJTs); this voltage 
is called Voz... To simplify circuit analysis, the transistor is sometimes considered to be saturated not 
when Vg, reaches Vpp but when Vc, cannot decrease any further (that is, when it reaches Veg,,))- 

Another important aspect related to the saturation region is derived from Equation 8.3; Ij>I-¢/B 
implies that the higher Ip, the more saturated the transistor is. To indicate approximately how deep into 
saturation the transistor is, the following parameter can be employed: 


Saturation factor: 
a= Bll (8.4) 


In the equation above, Ig and I; are the actual values of the base and collector currents, respectively. 
Therefore, a is the ratio between the current that would occur in the collector (GJ) in case the transistor 
had not been saturated and the actual current there. 

a=1 occurs in the active region, whereas a >1 indicates that the transistor is saturated. For example, if 
a =2, then the value of Ip is approximately twice the minimum value needed to saturate the transistor (recall 
that, strictly speaking, the transistor enters the saturation region when Vcg= Vg). For digital applications, 
a>1 is necessary to guarantee that the transistor remains saturated when is smaller than the estimated 
value (due to parameter dispersion or temperature decrease) and also to turn the transistor ON faster. On 
the other hand, a largea reduces the OFF switching speed and increases power consumption. The use of 
these equations will be illustrated shortly. 


8.4 DC Response 


As seen in Section 1.11, the three main types of circuit responses are: 


m DC response: Circuit response to a large slowly varying or static stimulus (employed for any type 
of circuit) 


m Transient response: Circuit response to a large fast-varying (pulsed) stimulus (normally employed 
for switching circuits) 


m AC response: Circuit response to a small sinusoidal stimulus (employed for linear circuits) 


The presentation starts with the DC response, which, using the transistor’s I-V characteristics 
(Figure 8.5) or the corresponding Equations 8.1-8.3 combined with the circuit’s own parameters, can be 
easily determined. This can be done analytically (using the equations) or graphically (using a load line), 
as illustrated in the examples that follow. 
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MM EXAMPLE 8.1 DC RESPONSE #1 


Suppose that the transistor in Figure 8.4 exhibits B=125, V;=0.7V, and Vcg.4=0.2 V. Given that 
Vec=3.3 V, Rg=68kO, Re=1kQ, and that V; is a slowly varying voltage in the range OV = V,= Vec, 
calculate: 


a. The range of V; for which the transistor is (i) cut off, (ii) in the active region, and (iii) in the 
saturation region. 


b. The saturation factor (a) when V, is maximum. 


SOLUTION 
Part (a): 


i. According to Equation 8.1, the transistor is OFF while V;< Vi that is, for 0= V;<0.7V. 


ii. When V, grows, so does I- because I-=flg and Ip=(V,—V;)/Rg. The growth of Ic causes Veg 
to decrease because Veg=Vec—Relc. Eventually, the point where Vcg=Vpp (or, equivalently, 
Vep=0) is reached, after which the transistor leaves the active for the saturation region. Using 
Voe= Vos (= Vj), Ie=(Vec- V;)/Rc=2.6mA results in the limit between these regions, so Ip=I¢/ 
B=20.8 uA and, consequently, from Ig=(V,— V;)/Rp, we obtain V;=2.11 V. Therefore, the transis- 
tor operates in the active region for 0.7 V = V,;=2.11V. 


iii. Finally, the saturation region occurs for 2.11 V<V;=3.3V. 
Note: As mentioned earlier, for (ii) a simplification is often made in which saturation is consid- 


ered when Vcg= Veggat Occurs instead of Vcg= Vee. In that case, the (approximate) results would be 
0.7V = V,;=2.39V for the active region, so 2.39 V< V;=3.3V for the saturation region. 


Part (b): 

When V;=3.3V (maximum), Ig=(V;— V;)/Rg=38.2 wA and Ic=(Vec- Vcegsat)/Rce=3-1mA. Therefore, 
a=BIg/I¢=125 x 0.0382/3.1=1.54, that is, the transistor operates with Iz approximately 54% higher 
than the minimum needed to fully saturate it. 


EXAMPLE 8.2 DC RESPONSE #2 


This example illustrates how a graphical procedure (load line) can help examine the DC response. 
Another common-emitter (inverter) circuit is shown in Figure 8.6(a) with a slowly varying volt- 
age OV=V,=Vcc applied to its input. Assume that the transistor’s I-V characteristics are those in 
Figure 8.5 (repeated in Figure 8.6(b)), so 8=150. Moreover, assume that V;=0.7V and Veg. = 0.2 V. 
Given that Vec=5 V, Rg=43kO, and R-=4700, do the following: 


a. Write the equation for the circuit’s DC load line, then draw it. 
b. Check whether the transistor is saturated when the input voltage is maximum. 


c. Calculate the coordinates for the operating points corresponding to cutoff and full saturation, 
then mark them on the load line. 


d. Calculate the coordinates for the point where the transistor enters the saturation region, then 
mark it on the load line. 


e. Highlight the section of the load line that belongs to the active region and comment on it. 
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Vec=5V Ic = - > Ip=100nA 
p=150 (mA)} (E) | 

V,=0.7V ' | 
Versat=0.2V 12 Ip=80pA { 


es 
Re=43kQ Voc 406 
Re=4702 1028 
92 = ; Ip=60,A 
6 <— la=40nA 
+ Ve 
Ve = 
3 Ip=20nA 
(A) 
t : v 
(a) (b) Vee (V) 


Vcesat Vee 


FIGURE 8.6. DC response of Example 8.2: (a) Basic common-emitter circuit; (b) Load line with operating 
points. 


SOLUTION 


Part (a): 

The load line equation involves the same signals employed in the I-V characteristics, that is, I. and 
Veg. The simplest way to write such an equation is by going from V¢c to GND through the collector. 
Doing so for the circuit of Figure 8.6(a), Vec=Rclc+Vcg is obtained. To draw the line, two points 
are needed; taking those for [-=0 (point A) and for V-g,=0 (point E), the following results: AUc¢=0, 
Veg=Vec)=A(OmA, 5V) and Ee = Vec/Re, Veg =0) = E(10.64mA, 0 V). These two points correspond 
to the ends of the load line shown in Figure 8.6(b). 


Part (b): 

When Vj, is maximum, Ig=(Vec- Vj)/Rg=100 vA, Veg=Vegsat=0.2 V (assuming that the transistor 
is fully saturated—checked below), and consequently Ic¢=(Vec-Vcesat)/Rc=10.21mA. From 
Equation 8.4 we then obtain a=BIp/I¢=(150 x 0.100)/10.21 =1.47, so the transistor is indeed satu- 
rated and operating with a base current roughly 47% higher than the minimum needed to saturate 
it (recall that this is an approximate/conservative estimate because the transistor indeed enters the 
saturation region when Vcg= Vee). 


Part (c): 

Both points were already determined above, that is, A(0mA, 5V) and D(10.21mA, 0.2V), and they 
are marked on the load line of Figure 8.6(b). Note that D lies on the intersection between the load 
line, and the I-V curve for Iz=100 A (because this in the transistor’s Ip saturation current). 


Part (d): 

It is important not to confuse the values of Ic and Veg at which the circuit enters the saturation 
region (this occurs for Vcg=Vpgpg) with those where the circuit actually rests within that region 
(normally at Veg= Veggat)- With Veg= Veg, le=(Vec — Vep)/Rs=9.15 mA results, so the coordinates 
are C(9.15mA, 0.7V). 


Part (e): 
The active region spans the range 0.7V=Ve,=5V or, equivalently, OMA =I-=9.15mA. This 
portion of the load line is highlighted by a gray area in Figure 8.6(b). Switching (digital) circuits 
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jump back and forth between points A and D, while linear (analog) circuits, like amplifiers and 
filters, operate in the active region (point B, for example). 


EXAMPLE 8.3 DC RESPONSE #3 


The common-emitter circuit of Figure 8.7(a) served as the basis for one of the first logic 
families, called DTL (diode-transistor logic—Section 10.2). Suppose that it is part of a digital 
system that operates with logic voltages '0'=0V and '1'=5V. To examine its DC response, let 
us consider again the application of a slowly varying voltage V, to its input, ranging between 
OV and 5V. 


a. Plot Vpp as a function of Vp. 
b. Plot Iz as a function of Vp. 
c. Plot Ic as a function of Vp. 


d. Plot Vc, as a function of Vp. 


V,=0.7V 
Veesat=0.2V 
R,=47kQ 
R2=47kQ 
Re=1kQ 


(a) 


48 | 


Ic (mA) 


0 1.4 29 5 0 14 29 5 
(b) Vs (V) (c) Ve (V) 


FIGURE 8.7. DC response of Example 8.3: (a) Common-emitter circuit to which a slowly varying voltage V, is 
applied; (b) Corresponding Vp- and V¢, plots; (c) Corresponding /, and /; plots. 


SOLUTION 


Part (a): 

While the transistor is OFF, Vpg is given by Vgp=VgR3/(R,+R2)=0.5 Vz. When Vp reaches V;=0.7V 
(which occurs for Vg=2V,,=1.4V), the transistor is turned ON. After that point, Vpp remains 
approximately constant at 0.7 V. The plot of Vgp is shown in Figure 8.7(b). 


Part (b): 
While V,<1.4V, the transistor is OFF, so Iz,=0. For Vz=1.4V, the transistor is ON, with Vpr fixed 
at 0.7V. Therefore, I,=Vpp/R,=14.9uA is constant. Because I,=Ig+I,, where I,=(Vg—Vpp)/Ry, 
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Ip=(Vg—Vpp)/R,- Ip results, which is zero for Vg<1.4V and 76.6uA for Vg=5V. The plot of Ig is 
shown in Figure 8.7(c). 


Part (c): 

While the transistor is OFF (Vg <1.4V), [-=0. When it is turned ON, it operates initially in the active 
region, in which I-=6lz and Vc, is large. However, as Ic grows, Vcr decreases, causing the transistor 
to eventually reach the point where Vcg= Vpg (0.7 V), below which it operates in the saturation region. 
As mentioned before, to simplify circuit analysis and design, the active region is often considered 
up to the point where V¢g reaches Vex. (~0.2 V), an approximation that will be adopted here. When 
Voer=Vcesat, the collector current is Ie=(Vec— Vegsat)/Rco=4.8mA. With the approximation mentioned 
above, this occurs for Ip=I¢/B=4.8/150=32 wA, so Vg= Ry (Ig + 1p) + Vpp=2.9 V. After this point, Ic stays 
constant at 4.8mA. The plot of Ic is also shown in Figure 8.7(c). 


Part (d): 

While the transistor is OFF, Vcg=5 V. In the active region, Veg= Vec- Rel, which lasts until Vg=2.9 V 
(recall the approximation mentioned above). After that, Vcg=Vcpggat= 0.2 V. The plot of Veg is shown 
in Figure 8.7(b). 


8.5 Transient Response 


A circuit’s behavior in the face of a large pulse is called transient response, which is a fundamental measure 
of the dynamic (temporal) performance of switching (digital) circuits. 

This type of response is illustrated in Figure 8.8. In Figure 8.8(a), the test circuit is shown, while 
Figure 8.8(b) depicts its respective input and output signals. In this example, the input stimulus is 


Vee 


(c) 


(d) 


R3 
1kQ 


FIGURE 8.8. (a) Test circuit; (b) Transient response; (c) BJT constructed with a Schottky diode between base 
and collector to reduce the turn-off time; (d) TTL NAND gate. 
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the voltage vg (applied to the transistor’s base through a current-limiting resistor Rg) and the output 
(response) is the collector current, ic. Note that lower case instead of capital letters are used to represent 
temporal (instantaneous) signals. Observe that the transitions of the measured signal (ic) are composed 
of four time intervals, called time delay (t,), rise time (t,), storage delay time (t,), and fall time (t,). 

As illustrated in Figure 8.6(b), when a transistor in a digital circuit is turned ON, its operating con- 
dition is changed from point A (cutoff) to point D (saturation). This process has three phases: (i) from 
cutoff to the beginning of the active region, (ii) through the active region, and (iii) through the saturation 
region until the final destination within it. The delay in (i) is ty, that in (ii) is ¢,, and that in (iii) is of little 
importance (for digital systems) because i- is then already near its final value. In summary, the total time 
to turn the transistor ON is approximately f,,=tg+t,. 

ty is the time needed to charge the emitter-base junction capacitance from cutoff to the beginning of the 
active region. After that, minority carriers (electrons in the case of an npn transistor) start building up in 
the base, causing ic to grow roughly in the same proportion. However, as ic grows, Ucr decreases, even- 
tually reaching the point where vc, =Upg (~0.7 V), thus concluding the passage through the active region. 

The time spent to traverse the active region is t,, which is measured between 10% and 90% of i¢’s static 
values. It is important to recall that, even though ic cannot grow much further in the saturation region, 
ig can, so charge continues building up in the base (known as saturation buildup). Because the amount of 
charge available affects the time needed to charge the emitter-base junction capacitor (ty) and also the 
buildup of charges in the base (t,), the larger Ip, the smaller t,,.. Therefore, a heavily saturated transistor 
(high q@) is turned ON faster than a lightly saturated one. 

Conversely to the process described above, to turn a transistor OFF, its operating condition must 
be changed from point D (saturation) back to point A (cutoff) in Figure 8.6(b). This process also has 
three phases: (i) from saturation to the beginning of the active region, (ii) through the active region, and 
(iii) through the cutoff region until i, ceases completely. The delay in (i) is f,, that in (ii) is t;, and that in 
(iii) is of little importance because ic is already near its final value. In summary, the total time to turn the 
transistor OFF is approximately to.=t, + ty. 

t, is the time needed to remove the charge previously stored in the base during the saturation buildup 
process. When enough charge has been removed, such that vcg=0pgg, the transistor enters the active 
region, during which the charge stored in the base continues being removed but now with ic decreasing 
roughly in the same proportion. 

Similarly to t,, t; is the time taken to traverse the active region, measured between 90% and 10% of the 
static values. Due to the nature of t,, it is clear that a heavily saturated transistor (high a) takes longer to 
be turned OFF (more charge to be removed) than a lightly saturated one. 

The storage delay (t,) is generally the dominant term in conventional low-speed transistors (see Figure 8.9). 
A classical technique for reducing it is to avoid deep-saturation operation. A transistor for that purpose is 
depicted in Figure 8.8(c), constructed with a Schottky (clamp) diode connected between base and collector. 
Such a diode is obtained from a metal-semiconductor contact, leading to a very compact and inexpensive 
structure with a low junction voltage, V;~0.4V; therefore, Vcg can never go below ~0.3V. 


Transistor | Manuf. ty (ns) t, (ns) ts (ns) t; (ns) fr (MHz) 
max. max. max. max. min. 
2N2222 | Philips ‘0 : 200 = 300 | 


MM3725 | Motorola 300 


MPS3640 | Motorola i 
MPS4258 | Motorola 10 7s a 


FIGURE 8.9. Transient delays and transition frequencies of some commercial BJTs. 
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Figure 8.8(d) shows the circuit of an actual NAND gate constructed with BJTs. This type of architecture 
is called TTL (transistor-transistor logic) and will be discussed in Chapter 10. TTL gave birth to the 
popular 74-series of digital chips. 


MM EXAMPLE 8.4 TIME-RELATED TRANSISTOR PARAMETERS 


Figure 8.9 shows the transient delays as well as the transition frequency (described next) of several 
commercial transistors. Even though improvements in the former tend to improve the latter, note 
that the relationship between them is not linear. Note also that, as expected, in low-speed transistors, 
t, is the largest transient delay. Mi 


8.6 AC Response 


The response of a circuit to a low-amplitude sinusoidal signal is called AC response, which is a fundamental 
measure of the frequency response of linear circuits. Even though it is related to analog rather than 
to digital circuits (except for f;, that indirectly applies to digital too), a brief description is presented 
below. 

To obtain the AC response of circuits employing BJTs, each transistor must be replaced with the corre- 
sponding small-signal model, after which the frequency-domain equations are derived, generally written 
in factored form of poles and zeros such that the corresponding corner frequencies and Bode plots can 
be easily obtained. Typical circuit parameters obtained from the AC response are voltage or current gain, 
input impedance, and output impedance. 

The BJT’s small-signal model is depicted in Figure 8.10. In Figure 8.10(a), the model for low and 
medium frequencies (no capacitors) is depicted, while Figure 8.10(b) exhibits the high-frequency 
model (which includes the parasitic capacitors). Note that in the former, the dependent current source 
is controlled by the AC current gain, h;,, while in the latter it is controlled by the transconductance 
factor, g,, (arelationship between them obviously exists, which is g,,=h;./r,, where h;.=B for low 


ip —> tx Cc, 
B—W- WN Cc 
+ um 
: I C, | Vx Om.Vx fo 
E E 
(b) 


slope=~—20 dB/dec 


frequency fr 


FIGURE 8.10. BJT’s small-signal model (a) for low and medium frequencies and (b) for high frequencies. The 
Bode plot in (c) illustrates the measurement of f; (the frequency at which h;, reduces to unity). 
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frequencies). The larger C, and C,,, the slower the transistor. A traditional way of measuring the effect 
of these capacitors is by means of a parameter called transition frequency (f;), given by the equation 
below. 


Im 
f~ Faic,sG) oy 

The transconductance factor is determined by g,,=Ic/¢,, where ¢,=26 mV is the thermal voltage at 
25°C, hence g,,, =39c. 

To measure f;, the frequency of the input signal (small-amplitude sinusoid) is increased until the 
transistor’s AC current gain (h;,, Figure 8.10(a)), which at low frequencies is constant and large (~100 to 
200) is reduced to unity (0 dB). This measurement is illustrated in the Bode plot of Figure 8.10(c), which 
shows frequency in the horizontal axis and h;, (in decibels, that is, 20log,,/;,) in the vertical axis (f;,=150 
was assumed at low frequency). 

The value of f; depends on the transistor’s transit times through the emitter, base, and collector 
regions, of which the transit time (of minority carriers) through the base is generally the largest. 
Therefore, to achieve high speed, that delay must be minimized. Common techniques for that are the 
use of a very thin base layer (see Figure 8.3(a)) and the use of a graded doping in the base (higher 
concentration near the emitter), which causes a built-in electric field that helps accelerate the carriers. 
Moreover, because electrons are about three times faster than holes (Figure 8.2), npn transistors are 
preferred for high-speed applications (their minority carriers in the base are electrons). 

fr is an important parameter because it allows the parasitic capacitors to be calculated (Equation 8.5), 
so the maximum “usable” frequency in a given (linear) circuit can be estimated. However, such a 
frequency depends not only on the transistor parameters (Figure 8.10(b)) but also on other circuit param- 
eters, like resistor values. For example, for common-emitter amplifiers, the cutoff frequency is normally 
well below f;/10. Some examples of f; values were included in Figure 8.9. With top-performance tech- 
nologies (Section 8.7), fr>200 GHz is already achievable. 

It is important to mention that the higher f;, not only the faster is a linear circuit constructed with that 
transistor, but also a digital circuit that employs it. In summary, f; is the main indicator of a transistor’s 
speed. For that reason, it is a common practice for research groups looking for faster transistors (like 
those described in the next section) to announce their achievements using mainly the fy value. 


8.7 Modern BJTs 


To improve performance (notably speed), a series of advanced devices were and are being developed. 
Some important examples are described next. 


8.7.1 Polysilicon-Emitter BJT 


One way of boosting the speed of BJTs is depicted in Figure 8.11, which shows a transistor fabricated 
with a very shallow emitter (compare Figure 8.11 to Figure 8.3). The thin emitter, on the other hand, 
increases base-emitter back charge injection, hence reducing the emitter efficiency (y) and, consequently, 
the current gain (8). To compensate for the lower B, an also thin polycrystalline Si film is deposited on 
top of the crystalline emitter layer, resulting in a two-layer emitter (polycrystalline-crystalline). Though 
the actual compensation mechanism is rather complex, it is believed that the lower mobility of polysili- 
con is partially responsible for decreasing the charge back-injection, thus reducing the shallow-emitter 
effects on B. 
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Thin polysilicon 


Thin p+ emitter 


p-substrate 


FIGURE 8.11. Simplified cross section of a polysilicon-emitter BJT. 


8.7.2 Heterojunction Bipolar Transistor 


A heterojunction bipolar transistor (HBT) is a transistor in which at least one of the junctions is formed 
between dissimilar materials. The dissimilar junction of interest is the base-emitter junction. The mate- 
rial with the wider energy gap is placed in the emitter, while the other constitutes the base. The difference 
between the energy gap of the two materials (which is zero in a regular BJT) greatly improves the emitter 
efficiency (y) by preventing base-emitter back injection, which translates into a much higher (up to three orders 
of magnitude) current gain (8). As a result, the base can now be more heavily doped to reduce its resistance 
(smaller transit time), and the emitter can be less doped (reduced capacitance), resulting in a faster transistor. 
Even though 6 decreases as the base is more heavily doped (8 decreases in the same proportion as the base 
doping concentration is elevated), this is viable because of the enormous boost of 6 caused by the dissimilar 
material. A major constraint, however, is that the two materials must have similar lattice constants (similar 
atomic distances) to avoid traps and generation-recombination centers at their interface. Moreover, HBTs are 
often constructed with materials that exhibit higher electron mobility than Si, like Ge, GaAs, and other III-V 
compounds. The two main HBT devices (GaAs-AlGaAs HBT and Si-SiGe HBT) are described below. 

GaAs-AlGaAs HBT: A heterojunction transistor having GaAs as the main material and AlGaAs as 
the dissimilar material is depicted in Figure 8.12. GaAs constitutes the substrate, collector, and base, 
while Al,Ga,_,As (0<x<1) is employed in the emitter (dark layer). The extra n+ emitter layer provides 
a proper ohmic contact with metal. The energy gap and the lattice constant of GaAs are E,=1.42eV and 
a=5.6533A°, respectively, while those of Al,Ga,_,As, for x=1, are Eg=2.17 eV and a=5.6605A°. There- 
fore, these two materials have similar lattice constants and, because the latter exhibits a higher Ec, it is 
located in the emitter. Notice in Figure 8.12 that to attain the extra benefit described above (higher speed) 
the emitter is less doped, and the base is more doped than in the regular BJT (Figure 8.3). 

Si-SiGe HBT: Although more expensive than Si, more difficult to process, and with an intrinsic carrier 
concentration (~10’? free electron hole pairs per cm? at 25°C—see Figure 8.2) too high for certain analog 
applications (e.g., very-low-noise amplifiers), Ge exhibits higher electron and hole mobilities than Si. 
And, more importantly, Ge can be incorporated into a regular Si-based CMOS process with the simple 
addition of a few extra fabrication steps, which is not viable with GaAs. This combination of technol- 
ogies is crucial for SoC (system-on-chip) designs because it makes possible the construction of high- 
performance digital as well as analog circuits on the same die. 

A simplified cross section of a Si-SiGe HBT is depicted in Figure 8.13. The main material is Si (sub- 
strate, collector, and emitter), while the dissimilar material is Si,_,Ge, (base, indicated with a dark layer). 
To avoid an undesirable energy-band discontinuity at the base-emitter junction, a small value of x is used 
initially, which increases gradually toward the collector (typically up to ~30%). The gradual increase of 
x creates a built-in accelerating electric field that further improves charge speed. As required, the two 
materials have a relatively large difference in energy gap (1.12eV for Si, 0.67 eV for Ge), and the material 
with the wider energy gap (Si) is used in the emitter. Moreover, to achieve the extra high-speed benefit 
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FIGURE 8.12. Simplified cross section of a GaAs-AlGaAs HBT. 


p Si-substrate 


FIGURE 8.13. Simplified cross section of a Si-SiGe HBT. 


described earlier, the base is more doped, and the emitter is less doped than in a regular BJT (Figure 8.3). 
The n+ layer at the emitter provides the proper ohmic contact with metal. Siand Ge, however, do not have 
similar lattice constants (5.431A° for Si, 5.646A° for Ge). The construction of Figure 8.13 is only possible 
because of a phenomenon called semiconductor straining. When a very thin (generally <100nm, depending 
on x) SiGe film is grown on top of a preexisting Si layer, the former conforms with the atomic distance 
of the latter (materials that conform with the lattice constant of others are called pseudo-morphic). As a 
result, construction defects at the Si-SiGe interface do not occur. The resulting transistor exhibits a cutoff 
frequency f;>200GHz, comparable to or even higher than that of GaAs-based transistors. 

Note: Even though in the approach described above Ge was employed as a possible replacement for 
GaAs, such is not possible in the fabrication of photo-emitting devices (LEDs, laser diodes, etc.) because 
Ge and Si are not direct-band devices, so they cannot emit light. 


8.8 Exercises 
1. Semiconductors 
a. Briefly describe “semiconductor doping.” 


b. Briefly compare Ge, Si, and GaAs. Which can be used for very-low-noise amplifiers? And for 
light-emitting devices? Why is Si preferred whenever possible? 
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2. BJT #1 
a. Make a sketch similar to that in Figure 8.3(a) but for an npn transistor instead of pnp. 
b. The thinness of one of its three layers is what causes the BJT to exhibit a high current gain. 
Identify that layer in Figure 8.3(a) as well as in your sketch for part (a) above. 
c. Briefly describe the BJT’s operation as a switch. 
3. BJT #2 
Make a sketch for a pnp transistor operating as a switch similar to that in Figure 8.4. (Suggestion: 
See similar circuits, with MOSFETs, in Figures 4.1 and 4.2.) 
4. DC response #1 
The questions below refer to the circuit of Figure E8.4, where V, is a slowly varying voltage from 0 V 
to 10 V. Note the presence of a negative supply (Vgg), which reduces the turn-off time by providing 
a negative voltage at the base when V,=0. 
a. For Vy,=O0V, calculate Vy, I, ly, Ip, Ic, and Vz. 
b. Repeat the calculations above for V,=10V. 
c. In part (b), is the transistor saturated? What is the value of a? 
Vec=10V 
Ves= -10V 
B=130 
V,=0.7V 
Vesa=0.2V 
Vx R=50k2 
R2=150kQ 
Rc=1kQ 
FIGURE E8.4. 


5. DC response #2 


The questions below pertain to the circuit of Figure E8.4. 


a. Plot Vy as a function of Vy. 


Plot Ip as a function of Vy. 

Plot Ic as a function of Vx. 

Plot V7 as a function of Vx. 

For which values of Vy is the transistor (i) cut off, (ii) in the active region, and (iii) saturated? 


Draw the load line for this circuit and mark on it (i) the cutoff point, (ii) the point where the tran- 
sistor enters the saturation region, and (iii) the point where the transistor rests while saturated. 


. DC response #3 


Redo Exercise 8.4 with Rc reduced to 5000. 
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7. 


10. 


11. 


12. 


13. 


14. 


DC response #4 

Redo Exercise 8.5 with Rc reduced to 5000. 

DC response #5 

Redo Exercise 8.4 with R, and Vgpz removed from the circuit. 
DC response #6 

Redo Exercise 8.5 with R, and Vgpz removed from the circuit. 
DC response #7 

Redo Example 8.2 with Rz=39 kO instead of 43k. 

DC response #8 

Redo Example 8.3 with R, = R,=39 kQ instead of 47 kQ. 
Dynamic transistor parameters 


a. Check the datasheets for the traditional 2N3904 BJT, and write down its transient delays (t4, t 
t,, t,) and transition frequency (fy). 


y 


b. Make a sketch for the transient response of this transistor (as in Figure 8.8(b)). 
c. Look for the commercial transistor with the largest f; that you can find. 
Transient response 


Make a sketch for the transient response relative to the circuit of Figure E8.4. Assume that vy is a 
0V/10V pulse with frequency 100 MHz and that the transistor switching delays are ty=t,=1ns and 
t,=t,=2ns. Using vy as reference, present three plots: for vy (with no delays), for ic (with the delays 
above), and finally one for vz (derived from ic). 


Switching speed-up technique 


Figure E8.14 shows an old technique for reducing the transistor’s turn-on and turn-off times, which 
consists of installing a capacitor in parallel with the base resistor. The advantage of this technique is 
that it causes a brief positive spike on vz at the moment when 7, transitions from '0' to 'l' and also a 
negative spike when it transitions from '1' to '0', thus reducing f,,, as well as f,¢¢ without the need for 
an extra supply (Vpp). Using 0; as reference, make a sketch of vg to demonstrate that the spikes do 
indeed happen. What is the time constant (T= R,g'C,,) associated with each spike? 


FIGURE E8.14. 


MOS Transistor 


Objective: Modern digital circuits are implemented using almost exclusively MOS transistors. 
Therefore, to truly understand digital circuits, it is indispensable to know how MOS transistors are 
constructed and work, which is the purpose of this chapter. Basic circuit analysis is also included in the 
examples, as well as a special section on new MOS construction technologies for enhanced high-speed 
performance. 
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9.1 Semiconductors 


Note: The material presented below, previously seen in Section 8.1, is indispensable for a good understanding 
of the sections that follow. For that reason, it is repeated here. 

The preferred semiconductor for the fabrication of electronic devices is silicon (Si), because Si-based 
processes are more mature and cost less than other semiconductor processes. However, to construct tran- 
sistors, diodes, or any other devices, the semiconductor must be “doped,” which consists of introducing 
controlled amounts of other materials (called dopants) into the original semiconductor. In the case of Si, 
which has four valence electrons, such materials belong either to group III (like B, Al, Ga, In) or group 
V (P, As, Sb) of the periodic table. 

Semiconductor doping is illustrated in Figure 9.1. In Figure 9.1(a), a dopant with five valence elec- 
trons (phosphorus, P) was introduced into the crystalline structure of Si. P forms four covalent bonds 
with neighboring Si atoms, leaving its fifth electron, which has a very small bonding energy, free to wan- 
der around in the structure. As a result, the region surrounding the P atom becomes positively charged 
(a fixed ion), while the electron becomes a free charge. Because P has “donated” an electron to the struc- 
ture, it is said to be a donor-type dopant, and because the free charge is negative, the doped material is 
said to be an n-type semiconductor. 
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Mobility (cm7/ V.s) Intrinsic carrier 


Semic. concentration @ 25°C 
Electrons (free pairs per cm’) 
Si 1400 ~10"° 
Ge | 3900 ~10" 
GaAs | 8500 ~10" 


(c) 


FIGURE 9.1. (a) Doping of silicon with a donor (phosphorus), resulting in an n-type material; (b) Doping with 
an acceptor (boron), resulting in a p-type material; (c) Carrier mobility and intrinsic carrier concentration of 
main semiconductors. 


The reverse situation is depicted in Figure 9.1(b), where a valence-3 atom (boron, B) was employed, 
creating a fixed negatively charged region around the B atom plus a free hole. Because B “accepts” an 
electron from the structure, it is said to be an acceptor-type dopant, and because the free charge is positive, 
the doped material is said to be a p-type semiconductor. 

Typical doping concentrations are Np=10 atoms/cm? for donors and N,=10'°atoms/cm? for 
acceptors. There are, however, cases when the semiconductor must be heavily doped (to construct wires 
or to create good ohmic contacts, for example), in which concentrations around 10'S atoms/cm? are 
employed (Si has a total of ~10**atoms/cm*). Heavily doped regions are identified with a “+” sign after 
n or p (that is, n+ or p+). 

Besides Si, Ge (germanium) and GaAs (gallium arsenide) are also very important semiconductors for the 
construction of electronic devices. A comparison between their electron and hole mobilities (to which 
the device’s maximum operating speed is intimately related) is presented in Figure 9.1(c). While GaAs has 
the highest electron mobility, it also exhibits the poorest hole mobility, so for high-frequency circuits, only 
electron-based GaAs devices are useful. Ge, on the other hand, has good mobility for electrons and for holes. 
Even though its intrinsic carrier concentration is too high for certain analog applications (like very-low-noise 
amplifiers), that is not a problem for digital circuits. The importance of these materials (GaAs and Ge) resides 
mainly on the fact that they can be combined with Si in modern construction techniques to obtain extremely 
fast transistors (described in Sections 8.7, for bipolar transistors, and 9.8, for MOS transistors). 


9.2 The Field-Effect Transistor (MOSFET) 


MOSFETs (metal oxide semiconductor field effect transistors), also called MOS transistors, were intro- 
duced in Section 4.1. They grew in popularity since the beginning of the 1970s when the first integrated 
microprocessor (Intel 4004, ~2300 transistors, 1971), employing only MOSFETs, was introduced. 

Such popularity is due to the fact that a minimum-size MOSFET occupies much less space than a 
minimum-size BJT, plus no resistors or any other biasing components are needed in MOSFET-based 
digital circuits (thus saving silicon space). More importantly, MOSFETs allow the construction of logic 
circuits with virtually no static power consumption, which is impossible with BJTs. 


9.2.1 MOSFET Construction 


Figure 9.2(a) shows the cross section of an n-channel MOSFET (also called nMOS transistor). It consists of 
a p-doped substrate in which two heavily doped + islands are created, which are called source and drain. 
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A control plate, called gate, is fabricated parallel to the substrate and is insulated from it by means of a 
very thin (<100A° in deep-submicron devices) oxide layer. The substrate region immediately below the 
gate (along the substrate-oxide interface) is called channel because it is in this portion of the substrate that 
the electric current flows (this will be explained later). This MOSFET is said to be of type n (n-channel or 
nMOS) because the channel is constructed with electrons, which are negative. 

A similar structure, but with the opposite doping profile, is depicted in Figure 9.2(b). This 
time the substrate is of type n, so the channel is constructed with holes (which are positive), so it 
is said to be a p-type MOSFET (p-channel or pMOS). Symbols for both transistors are shown in 
Figure 9.2(d). 

A top view for any of the MOSFETs shown in Figures 9.2(a)-(b) is presented in Figure 9.2(c), where W 
is the channel width and L is the channel length. Intuitively, the larger the W and the shorter the L, the 
stronger the current through the channel for the same external voltages. 

A fundamental parameter that identifies a MOS technology is the smallest device dimension that 
can be fabricated (normally expressed in micrometers or nanometers), more specifically, the shortest 
possible channel length (L,,,,,). This parameter was 8 wm in the beginning of the 1970s and is now 
just 65nm (and continues shrinking, with 45nm devices already demonstrated and expected to be 
shipped in 2008). For example, the top FPGA devices described in Chapter 18 (Stratix III and Virtex 5) 
are both fabricated using 65nm MOS technology. One of the main benefits of transistor downsizing is 
the increased transistor switching speed because of reduced parasitic capacitances and reduced path 
resistances. 

In the design of ICs, the technology parameter mentioned above is normally expressed by means of \ 
(lambda), which is half the smallest technology dimension. For example, \=60nm for 130nm technology. 
The smallest channel dimension generally is W/L=3\/2\, where W/L is called channel width-to-length 
ratio. As mentioned, the larger the W and the smaller the L, the stronger the transistor (that is, the higher 
its drain current for a fixed gate-to-source voltage). 
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FIGURE 9.2. Cross sections of (a) n-channel (b) p-channel MOSFETs; (c) Corresponding top view; (d) MOSFET 
symbols. 
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9.2.2 MOSFET Operation 


The digital operation of MOS transistors was also introduced in Section 4.1 (Figures 4.1-4.3). We now 
look at the physical structure of a MOS transistor to explain its behavior. 

An n-channel MOSFET is shown in Figure 9.3. In Figure 9.3(a), the original structure is seen under 
electrical equilibrium (no external electric fields), so the transistor is OFF. In Figure 9.3(b), a positive 
voltage is applied to the gate (note that the substrate is grounded). Because of the insulating SiO, layer 
under the gate, this structure resembles a parallel-plate capacitor with the gate acting as the positive 
plate and the substrate as the negative one. The gate voltage causes electrons (which are the minority 
carriers in the p-type substrate) to accumulate at the interface between the substrate and the oxide layer 
(channel) at the same time that it repels holes. If the gate voltage is large enough to cause the number of 
free electrons in the channel to outnumber the free holes, then that region behaves as if its doping were of 
type n. Because the source and drain diffusions are also of type 1, a path then exists for electrons to flow 
between source and drain (transistor ready for conduction). Finally, in Figure 9.3(c), another positive 
voltage is applied, this time between drain and source, so electrons flow from one to the other through 
the channel (the actual direction of the electrons is from S to D, so the arrow goes from D to S because 
it is always drawn as if the carriers were positive). When the gate voltage is removed, the accumulated 
electrons diffuse away, extinguishing the channel and causing the electric current to cease. In summary, 
the transistor can operate as a switch that closes between terminals D and S when a positive voltage is 
applied to terminal G. 

If the MOSFET in Figure 9.3 were of type p, then a negative voltage would be needed at the gate to 
attract holes to the channel. This, however, would require an additional (negative) power supply in the 
system, which is obviously undesirable. To circumvent that problem, the substrate is connected to a posi- 
tive voltage (such as Vpp=3.3V) instead of being connected to GND; consequently, any voltage below 
Vpp will look like a negative voltage to the gate. 

The minimum gate voltage required for the channel to “exist” is called threshold voltage (V7). In sum- 
mary, for an nMOS, Vccg= Vz is needed to turn it ON, while for a pMOS, Veg = Vz (Vz is negative) is 
needed. Typical values of V; are 0.4V to 0.7 V for nMOS and —0.5 V to —0.9V for pMOS. 

From the above it can be seen that a MOS transistor has a gate voltage controlling the drain to source 
current, whereas in a BJT a base current controls the current flow from collector to emitter. So a BJT is a 
current-controlled device and a MOS transistor is a voltage-controlled device. 

The qualitative MOS behavior described above will be translated into mathematical equations in the 
next section. 


p-substrate p-substrate p-substrate 


(a) OFF (b) Ready to conduct (c) Conducting 


FIGURE 9.3. MOSFET operation: (a) Original structure with no external electric fields (transistor OFF); 
(b) A positive gate voltage creates a “channel” between source and drain (the transistor is ready to conduct); 
(c) The addition of a voltage between the drain and source causes electrons to flow through the channel. 
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9.3 1-V Characteristics 


Figure 9.4 depicts typical I-V characteristics for an nMOS transistor. In each curve the drain-to-source 
current (Ip) is plotted as a function of the drain-to-source voltage (Vpg) for a fixed value of the gate- 
to-source voltage (Vz). 

Figure 9.4 is divided into two regions by the saturation curve (given by Vpg.44= Vcg— V7). The region on 
the right is called saturation region, while that on the left is called linear (or triode) region. In the saturation 
region, Ip is practically constant (that is, independent from Vpg), hence the name saturation, while in the 
linear region the dependence of I, on Vpg is roughly linear (though only for very small values of Vpg). 

There is a third region, called subthreshold region, which is below the curve for Vcg= Vz. Even though 
Ip is near zero in it, the tanh (hyperbolic tangent) behavior of Ip in it makes it suitable for certain analog 
applications, like neural networks. For digital applications, that region is normally disregarded. 

Note that the saturation and linear regions of a MOSFET correspond to the active and saturation regions 
of a BJT, respectively (these names can lead to confusion). Note also in Figure 9.4 that the I-V behavior of 
a MOSFET is inferior to that of a BJT in the sense that its saturation region is smaller than a BJT’s active 
region (compare the position of the saturation curve in Figure 9.4 against that in Figure 8.5). 

The I-V characteristics of Figure 9.4 can be summarized by the following equations: 


Cutoff region: 


Vos < Vj =0 (9.1) 
Saturation region: 
Vos Vz and Vo. = Ves— Vp Ip = (B/2)(Ves— Vy)? (9.2) 
Linear region: 
Vos Vz and Voc < Ves—Vz—> Ip =B [(Vos— Vq) Vos— Vp 52/21 (9.3) 


B in the equations above is measured in A/ V? (not to be confused with 6 of BJTs) and is determined by 
B=mC,,(W/L), where yu is the electron (for nMOS) or hole (for pMOS) mobility in the channel, C,,, is the 
gate oxide capacitance per unit area, and W/L is the channel width-to-length ratio. C,, can be determined 
by Cy. =€ox/ tox, Where €,,=3.9 eq is the permittivity of SiO,, e)=8.85-10-*F/cm is the permittivity in 
vacuum, and t,, is the thickness of the gate oxide. For example, C,,.=1.73fF/m’ results for t,,=200A°. 


(mA) <—— Saturation curve 
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Vos=1.57V 
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Voes=0.7V (Vr) 


Ee Threshold region 
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FIGURE 9.4. Typical /-V characteristics of an nMOS transistor. 
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If it is an n-type MOSFET, electrons are responsible for the current, whose mobility is given in 
Figure 9.1(c) (1400cm?/Vs for silicon). However, because the channel is formed at the Si-SiO, interface, 
defects and scattering at the interface (the structure of SiO, does not match the crystalline structure of Si) 
cause the actual mobility (called effective mobility) to be much lower (~50%) than that in the bulk. More- 
over, the mobility decreases a little as Vcg increases. Assuming ,,=700cm?/Vs, B=121(W/L) pA/V? is 
obtained. The same reasoning applies to p-type MOSFETs. However, it is very important to observe that 
because of its lower mobility (Figure 9.1(c)), the value of 6 for a pMOS is about 2.5 to 3 times smaller than 
that of a same-size nMOS (that is, 8, ~ 2.5B,,). 

The curve shown in Figure 9.4 that separates the triode from the saturation regions is called saturation 
curve and obeys the expression below. 


Saturation curve: 
Vossat= Ves— Vr (9.4) 


An important circuit parameter is the value of Vpg (or Ip) at the interface between these two regions. 
Such value is determined by finding the point where the circuit’s load line (explained in Example 9.2) 
intercepts the saturation curve. If the circuit’s load is written in the form of Equation 9.5, with param- 
eters a and b, then Equation 9.6 results: 


Circuit’s load line: 
Vos=aelyt+b (9.5) 
Saturation-triode boundary: 
Vps=11-(1-2abB)""|(aB) and Veg = Vps + Vy (9.6) 


The use of all equations above will be illustrated shortly. 


9.4 DC Response 


As seen in Section 1.11, the three main types of circuit responses are: 


m DC response: Circuit response to a large slowly varying or static stimulus (employed for any type 
of circuit) 


m Transient response: Circuit response to a large fast-varying (pulsed) stimulus (normally employed 
for switching circuits) 


m AC response: Circuit response to a small sinusoidal stimulus (employed for linear circuits) 


The presentation starts with the DC response, which, using the transistor’s I-V characteristics (Figure 9.4), 
or the corresponding Equations (9.1-9.3), combined with the circuit’s own parameters, can be easily deter- 
mined. This can be done analytically (using the equations) or graphically (using a load line) as illustrated in 
the examples that follow. 


MM EXAMPLE 9.1 DC RESPONSE #1 


A basic common-source (inverting) circuit is depicted in Figure 9.5(a). It consists of a MOSFET with 
a load resistor Rp connected to the drain and an input-impedance determining resistor R, connected 
to the gate. The circuit is called common-source because the source terminal is connected to a fixed 
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FIGURE 9.5. DC response of Example 9.1: (a) Common-source circuit to which a slowly varying voltage V, is 
applied; (b) Ip x Vg plot; (c) Vp, x Vg plot. 


voltage (ground in this case). Assuming that a slowly varying voltage V, in the range from 0V to 
Vpp is applied to the circuit, answer the following: 


. For which range of V¢ is the transistor in the cutoff region? 


a 
b. For which range of V¢ is the transistor in the saturation region? 


a 


For which range of V¢ is the transistor in the triode/linear region? 
d. Plot [p as a function of Vo. 


e. Plot Vpg as a function of Ve. 


SOLUTION 


Part (a): 

According to Equation 9.1, the transistor remains OFF while Vcg< V7. Because in this circuit 
Voes= Vc, the cutoff region is for 0 V = V.<1V. (Note: In practice, it is not totally OFF, but operat- 
ing in the subthreshold region, where the currents are extremely small, hence negligible for most 
applications.) 


Part (b): 

According to Equation 9.2, the transistor remains in the saturation region while Vpg=Ves- Vz. 
Therefore, using [y= (8/2)(Vcg— Vz)*combined with Vjg= Vip - Rplp, plus the condition Vjg= Vesg— Vip 
we obtain V¢,=2.79V. Therefore, the transistor operates in the saturation region while Vg is in the 
range 1V=V,=2.79V. For Vo=2.79V, Ip=3.21mA and Vps=1.79 V. 


Part (c): 
According to Equation 9.3, the transistor operates in the triode (or linear) region when Vpg < Veg — Vy, 
which occurs for any V¢>2.79 V. For V¢=5V, we obtain In=4.4mA and Vp, =0.6 V. 


Part (d): 

The plot of Ip is shown in Figure 9.5(b). From 0V to 1V the transistor is OFF, so [,=0. From 1V 
to 2.79 V it is in the saturation region, so Equation 9.2 was employed to sketch the current. Finally, 
above 2.79V it operates in triode mode, so Equation 9.3 was used for Ij. As mentioned above, 
[p=3.21mA for V,=2.79V, and Ip=4.4mA for Ve=5V. 


Part (e): 
The plot of Vpg is shown in Figure 9.5(c). It is straightforward because Vpg=Vpp-—Rplp. The main 
values of Vps are 5V while the transistor is OFF, 1.79 V for V¢=2.79 V, and finally 0.6 V for Vg=5V. 
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EXAMPLE 9.2 DC RESPONSE #2 


This example presents a graphical solution (with a load line) for the problem seen above (similarly to 
what was done for a BJT-based circuit in Example 8.2). The circuit was repeated in Figure 9.6(a). 


a. Write the equation for the circuit’s DC load line and draw it. 


b. Calculate the coordinates for the point where the transistor enters the linear region and mark it on 
the load line. 


c. Check whether the transistor is in the linear region when the input voltage is maximum. Calcu- 
late the coordinates of this point and mark it on the load line. 


d. Highlight the section of the load line that belongs to the saturation region (called active region for 
BJTs) and comment on it. 


Ves=4.5V 


Vos=4.0V 


Ves=3.5V 


Saturation region 


(a) (b) 


FIGURE 9.6. DC response of Example 9.2. 


SOLUTION 


Part (a): 

The load line equation involves the same signals employed in the I-V characteristics (Figure 9.6(b)), 
that is, Ip and Vps. The simplest way to write such an equation is by going from Vpp to GND through 
the drain. Doing so for the circuit of Figure 9.6(a), Vpp=Rplp+ Vpg is obtained. To draw the load line, 
two points are needed; taking those for I)=0 (point A) and for Vpg=0 (point E), the following results: 
A(Ip=0, Vps= Vpp) =A(OmA, 5V) and EUIp=Vpp/Rp, Vps=0)=E(5mA, OV). These two points cor- 
respond to the ends of the load line shown in Figure 9.6(b). 


Part (b): 

The intersection between the load line and the saturation curve (point C in Figure 9.6(b)) can be 
determined using Equations 9.5 and 9.6. To do so, the load line must be expressed in the form 
Vps=4-Ip+b, hence a=—Rp and b=Vpp. With these values, Equation 9.6 produces Vp,=1.79 V, from 
which Ipn=(Vpp- Vps)/Rp=3.21 mA is obtained. The value of Veg for this point, C(3.21mA, 1.79V), 
is 2.79 V, because Vpg= Veg— Vz for all points of the saturation curve. 


Part (c): 
Because Vcog=5 V>2.79 V, the transistor is in the linear region. Equation 9.3 must then be employed, 
along with Vps=Vpp-—Rplp, resulting in Ip=4.4mA and Vps,=0.6V. This point (D) is also shown 


9.5 CMOS Inverter 205 


in Figure 9.6(b). Note that, as expected, it falls on the intersection between the load line and the curve 


Part (d): 

The saturation region spans the range 1.79V = Vp, =5V or, equivalently, OmA <I) =3.21mA. This 
portion of the load line is highlighted by a gray area in Figure 9.6(b). Note that it is proportionally 
poorer (shorter) than the active region for a BJT (compare Figures 9.6(b) and 8.6(b)). If this circuit 
were employed as a switching (digital) circuit, then it would jump back and forth between points 
A and D. However, if employed as a linear (analog) circuit, it would operate in the saturation region 
(point B, for example). & 


9.5 CMOS Inverter 


The CMOS inverter was introduced in Section 4.2, where its operation, power consumption, and timing 
diagrams were described. Now that we know more about MOS transistors, other aspects can be exam- 
ined. More specifically, its construction, main parameters, DC response, and transition voltage will be 
seen in this section, while its transient response will be seen in Section 9.6. 

A CMOS inverter is shown in Figure 9.7(a). As seen in Section 4.2, CMOS stands for complementary 
MOS, meaning that for each nMOS transistor there also is a pMOS one. The CMOS inverter is therefore 
the smallest circuit in this family because it contains only one transistor of each type (identified as Mn 
for nMOS and Mp for pMOS). 

Its physical implementation is illustrated in the cross section of Figure 9.7(b), showing the nMOS on 
the left (constructed directly into the substrate) and the pMOS on the right (a large n-type region, called 
n-well, is needed to construct this transistor; this region acts as a substrate for pMOS transistors). Observe 
that the actual substrate is connected to GND, while the n-well is connected to Vpp (as explained earlier, 
the latter avoids the use of negative voltages because any voltage lower than Vpp will automatically look 
negative to the pMOS transistor). 

The operation of this circuit is as follows. Suppose that the threshold voltages of the nMOS and pMOS 
transistors are V7,=0.6V and V;,=-0.7V, respectively, and that Vpp=5V. In this case, any V; around 
or greater than 0.6 V will turn the nMOS transistor ON (because its source is connected to GND), while 
any voltage around or below 4.3 V will turn the pMOS transistor ON (because its source is connected to 
Vpp)- Therefore, for V;<0.6 V, only the pMOS will be ON, which guarantees that the output node (with 
the parasitic capacitance C) gets fully charged to Vpp. Likewise, when V|>4.3 V, only the nMOS will be 
ON, hence completely discharging C and guaranteeing a true zero at the output. Note that when the 
input remains at '0' or '1', there is no static current flowing between Vpp and GND (because one of the 


p-substrate 


FIGURE 9.7. CMOS inverter: (a) Circuit; (b) Cross section view. 
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FIGURE 9.8. DC response of a CMOS inverter: (a) Operating regions and transition voltage; (b) Measurement 
of low/high input/output voltages. 


transistors is necessarily OFF), which constitutes the main attribute of this kind or architecture (called 
CMOS logic). This means that the circuit consumes virtually no power while there is no activity at the 
input (except for very small leakage currents). 

The DC response of this circuit is illustrated in Figure 9.8(a). Portion I of the transfer curve 
corresponds to V;<V+y,, s0 Vo=Vpp because only Mp is ON; it operates in linear mode because 
Vosp=-5V and Vpg,=0V. Portion II corresponds to Vz, =V,<Vrp, so Mn and Mp are both ON. 
However, because V; is small and Vo is still high, Mn operates in saturation mode, while Mp con- 
tinues in linear mode. In portion II, both Mn and Mp operate in saturation, so the voltage gain is 
maximum and the transition is sharper (ideally, this portion of the curve should be vertical). Next, 
the circuit enters portion IV of the transfer curve, that is, Vpp< Vj = Vpp+ Vp, where Vo is low and 
V, is high, so Mn changes to linear mode, while Mp remains in saturation. Finally, portion V occurs 
when V,>Vpp+ V7», in which Mp is turned OFF, so Mn produces Vo =0V; Mn is in the linear region 
because V;=5V and Vo=0V. 

An important design parameter related to the DC response described above is the transition voltage 
(V rp), which is measured at 50% of the logic range, as depicted in Figure 9.8(a). Because in this point 
both transistors operate in saturation, the equation for Vzz can be easily obtained by equating the drain 
currents of Mn and Mp (using Equation 9.2), resulting in (see Exercise 9.14): 


Vag =[k(Vop + Vip) + Vy_l/(k+1) (9.7) 


where k=(6,, /B,)'/. This equation shows that, as expected, a strong nMOS (small k) pulls Vpp toward 
OV, while a strong pMOS (large k) pulls it toward Vpp. If k=1 and V7,=V7,, then Vrp=Vpp/2. 

The DC response is repeated in Figure 9.8(b) to illustrate how the low and high switching voltages 
are determined. The measurements are taken at the points where the tangent to the transfer curve is 
at 135°. The meanings of the voltages shown in the plots were introduced in Section 1.8 and will be 
seen again in Chapter 10 when discussing noise margins. 


MM EXAMPLE 9.3 CMOS INVERTER PARAMETERS 


Suppose that Mn and Mp in the circuit of Figure 9.7(a) are designed with (W/L), =3/2 and 
(W/L),,=6/2 (in units of lambda), and are fabricated using a silicon process whose gate oxide 
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thickness is f,,=100A° and threshold voltages are V;,=0.6V and V;,=—0.7V. Assuming that 
the effective mobility (in the channel) is ~50% of that in the bulk (see table in Figure 9.1(c)), 
determine: 


a. The value of 6 for the nMOS transistor (8,,). 
b. The value of B for the pMOS transistor (,). 


c. The inverter’s transition voltage (V 7p). 


SOLUTION 


Part (a): 
With B,=LpCox(W/L)n, Where “,=700cm?/Vs (50% of Figure 9.1(c)), Cox=€ox/toxr Eox=3-9&q, and 
€)=8.85- 10 *F/cm, B,=0.242(W/L),,mA/V? results, hence B,,=0.363mA/V?. 


Part (b): 
With B,=MpCox(W/L)p, where 4,=225cm°/Vs (50% of Figure 9.1(c)), By=0.078(W/L),mA/V* 
results, hence B,,=0.234mA/ Vv. 


Part (c): 
k=(B,/B,)'/? =0.8. Therefore, using Equation 9.7, Vrg=2.24V is obtained. Note that, as expected, 
Vor is below Vpp/2 because the nMOS transistor is stronger than the pMOS one (that is, 


B,>B,). i 


9.6 Transient Response 


As mentioned earlier, transient response is the behavior of a circuit in the face of a large rapidly varying 
stimulus (generally a square pulse) from which the temporal performance of the circuit can be mea- 
sured. This type of response is illustrated in Figure 9.9. In Figure 9.9(a), a CMOS inverter is shown with 
a load capacitance C,, and in Figure 9.9(b) its typical response to a voltage pulse is depicted. The circuit 
was assumed to operate with Vpp=5V. For simple logic circuits, the transient response is dominated 
by the rise (f,) and fall (¢;) times, so the propagation delays high-to-low (t,4,) and low-to-high (f,1 4) are 
approximately tyr, ~ ¢;/2 and thy ~t,/2. 

To determine ¢;, suppose that v, in Figure 9.9(b) has been at 0 V long enough for C;, to be fully charged 
(by Mp) to Vpp (=5 V), as illustrated in Figure 9.9(c). Then the situation of Mn right before the pulse occurs 
(at time tg) is that depicted in Figure 9.9(d), where still v9 =5 V and ip=0. However, at tp, (Figure 9.9(e)), 
Mn is turned ON, causing ip>0, which discharges the capacitor and consequently lowers vg. Suppose 
that V;,=0.6V. Because v; is now 5V, only for vo in the range 4.4V =v9=5V will Mn operate in the 
saturation region, after which it enters the linear region. In the former the current is large and constant 
(Equation 9.2), while in the latter it falls with vg (Equation 9.3). To determine ft; the current must be 
integrated in the range 4.5 V=v9=0.5 V (that is, between 90% and 10% of the static values, as shown in 
Figure 9.9(b)). After integration and some approximations, the equation below results for t;. Because the 
circuit is symmetrical, the same reasoning can be applied for t,. 


t ~4G(B,Vpp) (9.8) 
t ~ 4G I(B,Vpp) (9.9) 
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FIGURE 9.9. Transient response of a CMOS inverter. 


MM EXAMPLE 9.4 TRANSIENT RESPONSE OF A CMOS INVERTER 


As mentioned above, during the discharge of C, in Figure 9.9 the current is higher (and constant) 
while Mn is in saturation. Determine new equations for ¢; and t, adopting the (“optimistic”) approxi- 
mation that saturation indeed occurs during the whole discharge of C,. The delays obtained here are 
obviously smaller than the actual values. Compare them to those obtained from Equations 9.8-9.9 
(which are also approximate). Assume that V;,=0.6V, Vz,=-0.7V, By=2mA/V’, B,=1mA/V’, 
C,=10pF, and Vpp=5V. 


SOLUTION 


This problem deals with the charge/discharge of a capacitor by a constant current. The accumulated 
charge in the capacitor when fully charged is Q=C, Vpp. Dividing both sides by t, and recalling that 
Q/tis current, t=C,Vpp// results. In our particular problem, [= Ip=(86/2)(Ves— Vy (Equation 9.2), so 
the time taken to completely discharge C, is f=C,V/[(B/2)(Ves— V,)’]. With the values given above 
(for the nMOS), t;=2.58 ns results. A similar analysis for the pMOS transistor (to charge the capacitor) 
produces t,=5.41ns. The values from Equations 9.8 and 9.9 are t;=4ns and t,=8ns, respectively. Ml 


Power-delay product 

The power-delay (PD) product is an important measure of circuit performance because it combines 
information related to the circuit’s speed with its corresponding power consumption. A discussion on 
the PD product of a CMOS inverter, extensive to CMOS logic in general, is presented in Section 4.2. 
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9.7 ACResponse 


The response of a circuit to a low-amplitude sinusoidal signal is called AC response, which is a fundamental 
measure of the frequency response of linear circuits. Even though it is related to analog rather than to 
digital circuits (except for f;, which indirectly applies to digital too), a brief description is presented 
below. 

To obtain the AC response of circuits employing MOSFETs, the procedure is exactly the same as that 
seen for circuits with BJTs in Section 8.6, that is, each transistor must be replaced with the corresponding 
small-signal model, after which the frequency-domain equations are derived, generally written in factored 
form of poles and zeros such that the corresponding corner frequencies and Bode plots can be easily 
obtained. The most common parameters obtained from the AC response are voltage and current gain, 
input impedance, and output impedance. 

The MOSFET’s small-signal model is depicted in Figure 9.10. In Figure 9.10(a), the model for low and 
medium frequencies (no capacitors) is depicted, while Figure 9.10(b) exhibits the high-frequency model 
(which includes the gate-source and gate-drain parasitic capacitors). The dependent current source is 
controlled by the transconductance g¢,, (not to be confused with g,, of BTJs). 

In summary, the AC response of a MOSFET is determined by four parameters: C,., Cyg, Tg, and gy 
(Figure 9.10(b)). C,, is proportional to the gate capacitance (C,,WL); C,q is proportional to the gate-drain 
overlap, and even though its value is small it strongly affects the high-frequency response; r is inversely 
proportional to the Early voltage (channel-length modulation), so for long transistors it is normally neg- 
ligible (typically over 50k); finally, g,, can be determined as explained below. 

If ry ~ in Figure 9.10(a), then 1g=8,,,., that is, g,,=14/V,.. Because 4 and v,, are small AC signals, 
i,/V,, can be obtained by taking the derivative of Ip/Vgg, that is, 14/0,,=dIp/dV¢g. Using Equation 9.2, the 
following results: 


gn=pi,)"* (9.10) 


A last MOSFET parameter, which relates three of the parameters seen above (g,,, Cg., and C,4), is called 
transition frequency (f;), and it has a meaning similar to f; for BJTs (Equation 8.5). Note in Figure 9.10(b) 
that, contrary to Figure 9.10(a), the input current (i,) is no longer zero, and it grows with frequency 
because the reactances (X-=1/27fC) of the capacitors decrease. Defining the transition frequency as that 
at which the magnitude of the current gain |ig/i,|, under ideal conditions (short-circuited output), is 
reduced to unity, the following equation is obtained: 


f= Qull2m( Cys + Coal (9.11) 
& oe & 
Go oO oD Ge |} ©D 
+ + 
Vgs Qm.Vgs fd Vas I Cgs 9m.Vgs td 
So = oS So ne oS 
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FIGURE 9.10. MOSFET's small-signal model (a) for low and medium frequencies and (b) for high frequencies. 
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9.8 Modern MOSFETs 


Similarly to what was done for the BJT in Section 8.7, we conclude this chapter by describing some 
modern MOSFET construction approaches for enhanced speed. They are: 


@ Strained MOSFETs 
g SOI MOSFETs 


9.8.1 Strained Si-SiGe MOSFETs 


Besides geometry downsizing, another technique employed to increase the performance of MOSFETs is 
semiconductor straining, which consists of enlarging or compressing the atomic distance of the semicon- 
ductor that constitutes the channel. This technique is very recent (introduced by IBM and others in 2001) 
and has already been incorporated into several 90nm and 65 nm devices. 

Its basic principle is very simple and is illustrated in Figure 9.11(a). In this case, the preexisting layer 
(Si,_,Ge,) has a larger atomic distance than the epitaxial layer (Si) that will be grown on top of it. If the 
epitaxial layer is kept thin (below an experimentally determined critical thickness, which depends on 
x, usually <50nm), its atoms align with the underlying layer, hence resulting in a “stretched” (strained) 
Si layer. Because of reduced charge scattering and lower effective mass in the direction parallel to 
the interface, electrons travel faster in this structure. The underlying layer does not change its atomic 
distance, so is called a relaxed layer, while the epitaxial layer is called a strained layer. Note that, as the 
epitaxial layer stretches horizontally, elastic forces cause it to compress slightly vertically. 

An n-channel MOSFET constructed using this principle is shown in Figure 9.11(b). As seen in the 
study of HBTs, the lattice constant of Ge (a=5.646A°) is bigger than that of Si (@=5.431A°), so the atomic 
distance of a thick Si,_,Ge, (x>0) film is bigger than that of Si alone, causing the Si epitaxial layer to 
stretch. Common values of x in the relaxed SiGe layer are 0.2 to 0.3, while the epitaxial layer thickness is 
typically 10nm to 20nm, from which an increase around 10% to 20% in electron mobility results. 


Si 4 
(a=5.431A°) 9 
2 BS Strained layer 
Si1..Gey 
(a=5.646A° for x=1) Relaxed layer 


(a) 


Sy 
Relaxed SiGe 
Strained p-Si 
p-Si substrate 
(b) 


FIGURE 9.11. (a) Illustration of semiconductor-strain phenomenon; (b) Tensile strained Si-SiGe nMOS transis- 
tor; (b) Compressive strained Si-SiGe pMOS transistor. 


Strained channel 


n-Si substrate 
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The strain described above expands the atomic distances, so it is called tensile strain. Even though it 
has a positive effect on electrons, the opposite occurs with holes, which exhibit more mobility when the 
structure is compressed. Therefore, for p-channel MOSFETs, compressive strain is needed. One approach 
(introduced by Intel) is depicted in Figure 9.11(c), where the source and drain diffusions are made of 
SiGe. If the channel is very short, the net result is a compressed Si lattice in the channel, which increases 
the hole mobility around 20% to 25%. 

A concluding remark regards the use of other materials rather than SiGe to strain the channel. For 
example, a high-stress silicon nitride (Si,;N,) cap layer (covering basically the whole transistor) has been 
successfully used by Intel to induce tensile strain in the channel of nMOS devices. 


9.8.2 SOI MOSFETs 


SOI (silicon-on-insulator) devices are constructed over a buried oxide (insulator) layer. The most com- 
mon insulators are Al,O; (sapphire), Si,N, (silicon nitride), and more recently SiO, (silicon dioxide). 
A simplified view of a SOI chip utilizing SiO, as insulator is depicted in Figure 9.12, where an nMOS and 
a pMOS transistor can be observed. 

This approach has several benefits. Due to individual device isolation, wells are no longer needed 
to construct pMOS transistors, hence leading to denser devices. It also reduces the drain and source 
parasitic capacitances, therefore improving speed. Additionally, the latch-up phenomenon proper 
of conventional MOSFETs is eliminated. Finally, the small Si volume for electron-hole pair genera- 
tion makes it appropriate for radiation-intense environments (space applications). Like the strained 
MOSFETs described above, SOI MOSFETs have already been incorporated into many high-performance 
deep-submicron devices. 


9.8.3 BiCMOS Technologies 


A BiCMOS process is one that allows the fabrication of both types of transistors (BJT and MOS) in 
the same chip. This is desirable because their distinct characteristics can be combined to achieve better 
design solutions. Such designs include high-speed logic systems where large currents must be delivered 
and, especially, high-speed systems that combine analog and digital circuits on the same die. A good 
example is the implementation of wireless systems operating in the microwave range, where BJTs are 
needed for the RF and other analog sections, while CMOS is employed in the digital and mixed-mode 
parts. A BiCMOS process can include any combination between BJT- and MOSFET-based technologies, 
from conventional (Sections 8.2 and 9.2) to advanced (Sections 8.7 and 9.8). 


> SiO, 


Si substrate 


FIGURE 9.12. Cross section of n- and p-channel SOI MOSFETs. 
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9.9 Exercises 
MOSFET substrate versus MOSFET channel 


1. 


5. 


Why is a MOSFET constructed in a p-doped substrate (Figure 9.2(a)) called n-channel and in an 
n-doped substrate (Figure 9.2(b)) called p-channel? 


MOS technology 


a. When it is said that a certain technology is “65nm CMOS,” what does it mean? 


Check the data sheets for the FPGAs Stratix III (from Altera) and Virtex 5 (from Xilinx) and confirm that 
they are manufactured with 65nm CMOS technology. What is the supply voltage for these chips? 


Check the same parameters (technology node and supply voltage) for the previous versions of 
these devices (Stratix II and Virtex 4). 


PMOS operation 


Make a sketch like that in Figure 9.3 for a pMOS transistor, with the substrate connected to ground, 
then use it to explain how this transistor works. In this case, must the gate voltage to turn it ON be 
positive or negative? What types of carriers (electrons or holes) are needed to form the channel? 


Repeat the exercise above for the substrate connected to the positive supply rail (Vpp). In this 
case, does the gate voltage to turn the transistor ON need to be negative? 


I-V characteristics 


Make a sketch for the I-V characteristics (similar to that in Figure 9.4) for an nMOS transistor 
with B=5mA/V? and V;=0.6V. Include in your sketch curves for the following values of 
Vegi 0.6 V, 1.1 V, 1.6V, 2.1 V, and 2.6 V. 


Also draw the saturation curve. 


Suppose that a circuit has been designed with the transistor operating with the load line 
Vpp=Rplp+ Vps, where Vpp=10V and Rp =1k©. Include this load line in your sketch. 


Calculate the coordinates for the point of the load line that falls on the limit between the satura- 
tion and triode regions (Equation 9.6), then mark it in your sketch. 


What are the ranges for Vp, and Ip in the saturation region? 


Calculate the coordinates for the point where the transistor rests when V.=1.8 V, then mark it 
in your sketch. Is this point in the triode or saturation region? 


B parameter 


Calculate the value of 8 for a pMOS silicon transistor whose gate oxide thickness is 120A° and 
whose channel width and length are 2 wm and 0.4 wm, respectively. Consider that the carrier mobil- 
ity in the channel is 50% of that in the bulk. 


DC response #1 


Aslowly varying voltage 0 V = V,=5V is applied to the circuit of Figure E9.6. 


b. 


For which range of V; is the transistor in the cutoff region? 


For which range of V, is the transistor in the saturation region? 


9.9 Exercises 


c. For which range of V, is the transistor in the triode/linear region? 
d. Verify the correctness of the [Ip plot in Figure E9.6(b). 
e. Verify the correctness of the Vg plot in Figure E9.6(b). 


Vop 6 0.6 
Ro i} lo 4 0.4 
Vo Vo (V) Ip (mA) 
V, 
1 Voo=5V 2 0.2 
Re V7=0.5V : 
p=0.4mAV 
Ro=7.8kQ " y 
0 1 2 3 4 5 
(a) (b) Vv, (V) 
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FIGURE E9.6. 


7. DC response #2 

Redo Example 9.1 for B=1mA/V? and Rp =1.8k0. 
8. DC response #3 

Redo Example 9.1 for Rp=2.7kO and Vpp=10V. 
9. DC response #4 


The questions below refer to the circuit in Figure E9.9. 


a. Calculate Ip. (Suggestion: Assume that the transistor is operating in saturation (so employ 
Equation 9.2), then afterwards check whether your assumption was true, that is, if 


Vps= Ves- Vr) 
b. Calculate Vpg. 


c. Draw a load line for this circuit as in Example 9.2 (the I-V curves are not needed). 


d. Calculate the point (Ip, Vps) at the intersection between the triode and saturation regions 


(Equation 9.6), then mark it in your sketch. Does it fall on the load line? 


Vop=12V 
Vy=1V 
p=4mAV* 
R,=90kQ 
R2=30kQ 
Ro=1kQ 


FIGURE E9.9. 
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e. What are the ranges for Vp, and Ip in the saturation region? 


f. Include in your sketch the point (Ip, Vps) calculated in parts (a-b) . Is the transistor operating in 
saturation or triode mode? 


10. Drain current in saturation mode 


Prove that, when the transistor in the self-biased circuit of Figure E9.10 operates in saturation mode, 
its drain current is given by Equation 9.12 below, where V¢ is determined by Equation 9.13. 


In={1 + BRe(Ve—Vy)—[1 + 2BRe(Ve— V4) ]"7}/(BRe2) (9.12) 
Ve=VopRyl(R, + V5) (9.13) 


FIGURE E9.10. 


11. DC response #5 


Say that the circuit of Figure E9.10 is constructed with Vpp=12V, R,=200kQ, R,=100kQ, Rp=2kO, 
and Rg=500. Assume also that the transistor exhibits V;=1V and B=4mA/V’. Do the following: 


Calculate V; and Ip (see suggestion in Exercise 9.9 and also Equations 9.12-13). 
b. Calculate Vgand Vpg. 


c. Write the equation for the load line of this circuit (note the presence of Rg), then draw it similarly 
to what was done in Example 9.2 (the I-V curves are not needed). 


d. Calculate the point (Ip, Vps) at the intersection between the triode and saturation regions 
(Equation 9.6), then mark it in your sketch. 


e. What are the ranges for Vpg and Ip in the saturation region? 


f. Include the point (Ip, Vps) calculated in parts (a)-(b) in your sketch. Is the transistor in this 
circuit operating in saturation or triode mode? 


12. MOS triode-saturation boundary 
Prove Equation 9.6. 
13. MOS transconductance parameter 


Prove Equation 9.10. 
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14. 


15. 


CMOS inverter #1 


Prove Equation 9.7 for the transition voltage of a CMOS inverter. (Hint: Equate the currents of Mn 
and Mp with V;= Vo so both transistors are in saturation). 


CMOS inverter #2 


Consider the CMOS inverter shown in Figure E9.15. Suppose that Mn was designed with a mini- 
mum size, that is, (W/L),=3/2), and that the pMOS transistor was designed with a minimum 
length, L,=2n. 


FIGURE E9.15. 


16. 


17. 


18. 


a. Assuming that the mobility of electrons in the channel is about three times that of holes, what 
must the width of the pMOS transistor (W,,) be for the two transistors to exhibit the same trans- 
conductance factors (that is, B, =,)? 


b. Intuitively, without inspecting Equation 9.7, would you expect the transition voltage to change 
toward 0V or Vpp when (W/L),, is increased with respect to (W/L),,? Explain. 


c. If 8,>f,, assuming that V;,=V+,, do you expect the transition voltage to be higher or lower 
than Vpp/2? 


CMOS inverter #3 


Consider again the CMOS inverter of Figure E9.15. When V; changes and the circuit passes through 
the point of transition (that is, Vj>=V,=V-7p), prove that the current is given by Equation 9.14 
below. 


= [BB p(Vop + Vap— Van) V12(8 7 + B,””)] (9.14) 
CMOS inverter #4 
Suppose that Mn and Mp in the CMOS inverter of Figure E9.15 are designed with (W/L), =3/2 
and (W/L), =8/2 (in units of lambda) and are fabricated using a silicon process whose gate oxide 


thickness is f,,=150A°. Assuming that the effective mobility (in the channel) is ~50% of that in the 
bulk, calculate the values of B,, and B,. 


CMOS inverter #5 


Consider the CMOS inverter shown in Figure E9.18. Using the values given for the parameters, 
determine the fall and rise times and then make a sketch of the transient response (a sketch of v9 as 
a function of v,). Try to draw it as much to scale as possible, assuming that each horizontal division 
in the plots is 5ns (thus the 5V-part of 7, lasts 30ns). 
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FIGURE E9.18. 


19. nMOS inverter #1 


The inverter in Figure E9.19 is called nMOS inverter because it employs only nMOS transistors. Its 
main disadvantage with respect to the CMOS inverter is that it does consume static power. Explain 
when that happens. (In other words, for V;='0' or '1' does Ip #0 occur?) 


FIGURE E9.19. 


20. nMOS inverter #2 
The questions below still regard the nMOS inverter of Figure E9.19. 
a. Why is transistor M2 always ready to conduct and can only operate in saturation mode? 
b. When V;=0V (='0'), is M1 ON or OFF? Does this cause Vo to be high or low? Why? 
c«. Explain why M1 and M2 are both ON when V;= Vpp (assume 3.3 V), hence [p #0 results. 


d. Prove that in (c) above the output voltage is given by Equation 9.15 below, where 8, and B, are 
the transconductance factors of transistors M1 and M2, respectively. (Hint: Equate the currents 
of M1 and M2, with M1 in triode (Equation 9.3) and M2 in saturation (Equation 9.2).) 


Vo=(Vpp- 4){1 -[B,/(B,+ 8)]"7} (9.15) 


e. If B,=3By, Vpp=3.3 V, and V;=0.5V, calculate Vo when V;=Vpp. 


9.9 


21. 
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In (e) above, check whether M1 is indeed in triode mode and M2 is in saturation. 


What is the value of I, in (e) above (assume that 8, =0.1mA/V’). Using the voltage Vo obtained 
in (e), calculate Ip using the expression for M1 (triode) as well for M2 (saturation). Are the results 
alike? 


nMOS inverter #3 


The questions below refer to the nMOS inverter of Figure E9.19. 


a. The transition voltage is normally defined as the input voltage that causes the output voltage to 


Cc 


reach 50% (midpoint) between the two logic voltages (GND and Vpp). Adopting this definition, 
prove that the transition voltage is given by Equation 9.16 below, where k=(8,/ py 2. (Hint: 
Equate the currents of M1 and M2 with both in saturation and with Vo=Vpp/2.) 


Vig=kVpp/2+(1—-k) Vy (9.16) 
Another definition for the transition voltage, particularly useful when the actual '0' and '1' volt- 


ages are not true-rail voltages (as in the present circuit, where Vo is never truly 0V) is that in 
which Vo= V,. If this definition is adopted, then prove that Vz is given by Equation 9.17, where 


again k= (85/8,)"*. 
Vig=[kVop + (1-KVGI(1 +8) (9.17) 


If 8B, =3B5, Vpp=5V, and V;=0.6V, calculate Vyp in both cases above. 
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Logic Families and I/Os 


Objective: This chapter describes the main logic families, including BJT-based as well as MOS- 
based architectures. In the former, DTL, TTL, and ECL are included. In the latter, not only is the tradi- 
tional CMOS architecture presented but also pseudo-nMOS logic, transmission-gate logic, footed and 
unfooted dynamic logic, domino logic, C7MOS logic, and BiCMOS logic. Additionally, a section describ- 
ing modern I/O standards, necessary to access such ICs, is also included, in which LVCMOS, SSTL, 
HSTL, and LVDS I/Os are described, among others. 


Chapter Contents 
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10.2. Diode-Transistor Logic 
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10.10 Exercises 


10.1. BJT-Based Logic Families 


The first part of this chapter (Sections 10.2 to 10.4) describes logic circuits constructed with BJTs (bipolar 
junction transistors). 

Digital ICs sharing the same overall circuit architecture and electrical specifications constitute a 
digital logic family. The first digital families (DTL, TTL, and ECL) were constructed with BJTs. However, 
the large silicon area required to construct the components of these families (transistors plus associated 
resistors), and especially their high power consumption, led to their almost complete replacement with 
MOSFET-based families. 

In spite of the limitations above, the analysis of BJT-based families is important because it helps us 
to understand the evolution of digital technology. Moreover, one of the BJT-based families, called ECL 
(emitter-coupled logic), is still in use and is the fastest of all logic circuits. 
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The following BJT-based logic families are described in the sections that follow: 
m DTL (diode-transistor logic) 
gm TTL (transistor-transistor logic) 


m@ ECL (emitter-coupled logic) 


10.2 Diode-Transistor Logic 


The first BJT-based logic family was DTL, which was developed in the 1950s. However, because its 
circuits involve diodes, a brief recap of diodes is presented first. 

An experiment, showing the measurement of a diode’s I-V characteristic, is depicted in Figure 10.1. 
The circuit is shown in Figure 10.1(a), with the diode submitted to a slowly varying DC voltage V,,.,, 
which causes the current I, and voltage Vp through/across the diode. In Figure 10.1(b), the measured 
values of Ip and Vp are plotted. When the diode is forward biased (that is, Vp is positive), a large cur- 
rent flows after Vp reaches the diode’s junction voltage, V, (~0.7V for silicon diodes). Due to the sudden 
increase of Ip, the series resistor employed in the experiment is essential to limit the current, otherwise 
the diode could be damaged. On the other hand, when Vp is negative, practically no current flows 
through it (given that Vp is kept below the diode’s maximum reverse voltage, Vemax, Which is many 
volts). In summary, roughly speaking, the diode is a short circuit (though with 0.7 V across it) when for- 
ward biased or an open circuit when reverse biased. 

The DTL family was derived from a previous family, called DL (diode logic), which employed only 
resistors and diodes. Two DL gates are illustrated in Figures 10.2(a)-(b), which compute the AND and 
OR functions, respectively. 

The AND gate constructed with DL logic of Figure 10.2(a) operates as follows. Suppose that a=OV, 
then diode D, is forward biased (through the path V-c—R-D,-4), so the voltage across it is 0.7 V (=V;), 
causing y=0.7 V='0' at the output. Note that this situation holds for a or b or both low. On the other hand, 
if both inputs are high (=V¢,), the diodes are both OFF, so no current can flow through R, resulting in 
Y=Vec='l'. It is obvious that when a load is connected to node y, the node’s voltage gets reduced; calling 
I, the load current, y= Vcc—Rf;, results. A similar analysis can be made for the OR gate of Figure 10.2(b), 
which is left to the reader. 

We turn now to the DTL family. Its construction consists of a DL circuit followed by a bipolar transis- 
tor, with the latter propitiating the needed current gain and circuit isolation. 


Ip A 


Imax- bi 


Viest 


>Vp 


(b) 


FIGURE 10.1. /-V characteristic of a common diode: (a) Experimental circuit; (b) Measurements. 
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Vec D1 DL gate 


a ba , 
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FIGURE 10.2. DL and DTL architectures: (a) DL AND gate; (b) DL OR gate; (c) DTL NAND gate. 


An example of a DTL gate is shown in Figure 10.2(c) (note the similarity between this circuit and 
that in Figure 8.7), in which the DL circuit is the AND gate of Figure 10.2(a). Therefore, given that a 
common-emitter BJT is inherently an inverter (because when the base voltage grows, the collector volt- 
age decreases), a NAND results from this association (AND +inverter). This can also be understood 
by simply observing the voltage on node x; when it is high, the BJT is turned ON, so y= Veggat (~0.2 V) 
results, while a low voltage on x turns the transistor OFF, raising the voltage of y towards V¢c. 

DTL circuits are slow, require high-value resistors (thus consuming large silicon space), and dissipate 
considerable static power. For these reasons, they were never adequate for integrated circuits. 


10.3  Transistor-Transistor Logic (TTL) 


The second bipolar family, called TTL, was developed in the 1960s. It was the first built on ICs and 
attained enormous commercial success. 


10.3.1 TTL Circuit 


A TTL example is depicted in Figure 10.3, which shows a NAND gate. The circuit contains four BJTs, 
where Q, is a multiemitter transistor (similar to that in Figure 8.3(a) but with two electrical connections 
to the emitter region). The operation of this gate is described below. 

When a and B are both high, Q, is turned OFF, so Vcc is applied to the base of Q, through R, and the 
base-collector junction of Q;, turning Q, ON. With Q, ON, current flows through R3, raising the base 
voltage of Q,, which is then turned ON. This current, also flowing through R,, lowers the collector volt- 
age of Q,, keeping Q3 OFF. In summary, due to Q3 OFF and Q, ON, the output voltage is low (as seen in 
Chapter 8, this voltage is not truly 0V, but Vega; (~0.2 V)). 

When a or b or both are low, the base-emitter junction of Q, is forward biased, causing its collector 
voltage to be lowered toward V¢p,, (~0.2 V), hence turning Q, OFF. With no current flowing through R, 
and R3, the voltage at the base of Q; is raised, turning it ON, while the voltage at the base of Q, is 0V, 
turning it OFF. In summary, now Q; is ON and Q, is OFF, raising the output voltage toward V¢c, which 
completes the NAND function. 
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FIGURE 10.3. NAND circuit constructed with TTL logic. 


FIGURE 10.4. Examples of TTL chips (7400 and 7404). 


Note above the following major limitation of TTL: When y='l', the output voltage, instead of true 
Veo, is Y=Vec-Vp2-2V; (where Vp is the voltage drop on R, caused by the base current of Q3, and 
V,=0.7V is the junction voltage of Q; and also of the series diode). Consequently, given that this family 
operates with a nominal supply Vcc=5V, y<3.6 V results at the output. In summary, the TTL family has 
a relatively fine '0' (=Vcggat) but a poor '1' (<3.6V). 

The TTL architecture gave origin to the very popular 74-series of logic ICs, which includes all kinds 
of basic circuits. For example, the 7400 chip contains four NAND gates; the 7401, four open-collector 
NAND gates; the 7402, four NOR gates; and so on. 

Many TTL chips are constructed with 14 or 16 pins, encased using the DIP (dual in-line package) 
option seen in Figure 1.10. Two examples (with 14 pins) are shown in Figure 10.4. On the left, the 7400 IC 
is depicted, which contains four 2-input NAND gates, while the 7404 IC, with six inverters, is seen on the 
right. Pins 7 and 14 are reserved for GND (OV) and Vec (5 V). 


10.3.2 Temperature Ranges 


Chips from the 74-series are offered for two temperature ranges, called commercial and industrial. The 
74-series also has an equivalent series, called 54, for operation in the military temperature range (for each 
chip in the 74-series there also is one in the 54-series). In summary, the following is available: 


= Commercial temperature range (0°C to 70°C): 74-series. 
m Industrial temperature range (—40°C to 85°C): 74-series. 


m@ Military temperature range (—55°C to 125°C): 54-series. 
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10.3.3 TTL Versions 


To cope with the low speed and high power consumption of the original TTL family, several improve- 
ments were introduced throughout the years. The complete list is presented below (where 74 can be 
replaced with 54). 


m 74: Standard TTL 

m 74S: Schottky TTL 

74AS: Advanced Schottky TTL 

74LS: Low-power Schottky TTL 

74ALS: Advanced low-power Schottky TTL 


| 
| 
| 
mg 74F: Fast TTL 

To identify the technology employed in a particular chip, the representation 74XXxx (or 54XXxx) is 
adopted, where XX is the technology and xx is the circuit. For example, the 7400 chip shown in Figure 10.4(a) 
is identified as 7400 (for standard TTL), 74S00 (Schottky TTL), 74AS00 (Advanced Schottky TTL), etc. 

To construct the circuits for the 74S version, the starting point was the standard TTL circuits (like 
that in Figure 10.3), to which Schottky (clamp) diodes (described in Chapter 8—see Figure 8.8(c)) were 
added. Such a diode exhibits a low junction voltage (Vi schottky = 9.4V), thus preventing the collector 
voltage from going below ~0.3V (VcgEmin=Vee- V;-schottky ~9-7-0.4V=0.3V), hence avoiding deep 
saturation. Because the less a transistor saturates, the faster it is turned OFF, the resulting turn-off time 
(which is usually larger than the turn-on time) is reduced. 

A comparison between the TTL versions is presented in Figure 10.5. Note, for example, the different 
current capacities, speeds, and power consumptions. For example, the last TTL version (F) has the lowest 
delay-power product among the fast versions. Other parameters from this table will be examined below. 


Parameter Symbol TTL versions Unit 


Nominal supply voltage Voc | 5+0.5 | 5+0.5 | 5+0.5 5+0.5 | V 
Minimum input high voltage Mec 2 2 2 Sie ae aD ees 


Maximum input low voltage Vit 0.8 0.8 
| Minimum output high voltage Vou 2.4 a f = yf 
Maximum output low voltage VoL 0.4 0.5 0.5 0.5 0.5 


Maximum input high current hin 40 50 
| Maximum input low current (*) ti 6 | 2 | 08 
Maximum output high current (*) lon -0.4 -1 


Maximum output low current lot 16 20 
Fan-out (LS loads) 20 50 


Typ. power consumption per gate P 10 20 10 2 15 5 uW 


(*) Minus sign means that the current /eaves the chip. 


FIGURE 10.5. Main parameters of TTL technologies used in the 74/54 series of digital ICs. 
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(b) 


FIGURE 10.6. Illustration of fan-out: (a) Output high (the gate sources up to 0.4mA to other gates); (b) Output 
low (the gate sinks up to 16mA from other gates). In this example (conventional TTL) the fan-out is 10. 


10.3.4 Fan-In and Fan-Out 


Fan-in is the number of ports (inputs) of a gate. For example, the NAND gate of Figure 10.3 has a 
fan-in of 2. 

Fan-out is the number of input ports that an output port can drive. For example, Figure 10.5 says that 
the maximum output current that can be sourced by a conventional TTL gate when high (Jo) is 0.4mA, 
while the maximum current that a similar gate might sink at the input when high (Ip;) is 0.04mA. The 
conclusion is that one TTL output can source 10 TTL inputs. This situation is illustrated in Figure 10.6(a). 

Figure 10.5 also says that the maximum current that a conventional TTL gate can sink at the output 
when low (Io,) is 16mA, while the maximum current that a similar gate might source at the input when 
low (I) is 1.6mA. The conclusion is that again one TTL output can drive 10 TTL inputs. This situation 
is illustrated in Figure 10.6(b). 

The worst (smallest) of these two values (current source versus current sink) is taken as the actual 
fan-out. Because they are both equal to 10 in the example above, the fan-out of a conventional TTL gate 
is said to be 10. In other words, the output of any conventional TTL gate can be directly connected to up 
to 10 inputs of other conventional TTL gates. 

Because of the fact that there are several TTL options, a standard load was defined for fan-out 
comparisons. The chosen load is LS TTL. Because the input of an LS gate sinks at most 0.02mA (when 
high) and sources at most 0.4mA (when low), the resulting fan-out for a conventional TTL gate (with 
respect to LS TTL) is 0.4/0.02=20 or 1.6/0.4=40; taking the worst case, fan-out=20 results. In other 
words, a conventional TTL output can be directly connected to up to 20 LS inputs. 


10.3.5 Supply Voltage, Signal Voltages, and Noise Margin 


The supply voltage and minimum/maximum allowed signal voltages constitute another important set of 
parameters for any logic family because they determine the power consumption and the noise margins. 

The voltages for the TTL family are depicted in Figure 10.7(a), where gate A communicates with gate B. 
The detail in the center shows two bars, each with two voltage ranges. The bar on the left shows the volt- 
age ranges that can be produced by A, while the bar on the right shows the voltage ranges that are accept- 
able to B. The nominal supply voltage is V-c=5V, and the allowed maximum/minimum signal voltages 
are Vo, =0.4V, Voy=2.4V, Vi_=0.8V, and Viy=2.0V. 
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Output 


Input 


Noise margin 


(b) 


FIGURE 10.7. (a) Supply and minimum/maximum signal voltages for the TTL family; (b) Corresponding noise 
margins. 


The meaning of the parameters above was seen in Section 1.8 and is repeated below. 
m V,,: Maximum input voltage guaranteed to be interpreted as '0'. 

@ Vj: Minimum input voltage guaranteed to be interpreted as '1' 

@ Voy: Maximum output voltage produced by the gate when low. 

@ Vox: Minimum output voltage produced by the gate when high. 


The noise margin can then be determined as follows, where NM, represents the noise margin when 
low and NM;, represents the noise margin when high. 


For TTL, NM, =NM,;=0.4V results. This means that any noise added to the input signals, when 
having an amplitude under 0.4V, is guaranteed not to corrupt the system. 


10.4  Emitter-Coupled Logic 


The last BJT-based family is ECL, which was developed for high-speed applications. Due to its perma- 
nent emitter current, it belongs to a category called CML (current-mode logic). In spite of its large power 
consumption and large silicon area (for transistors and resistors), ECL has the advantage of being the 
fastest of all logic circuits (employed, for example, in the construction of very fast registers, shown in 
Chapter 13). 

An OR gate, constructed with traditional ECL, is depicted in Figure 10.8. The supply voltage is 
Vep=—9.2 V, and the circuit also contains a reference voltage of —1.3 V. The logic levels are also unusual, 
'0'=-1.7V and '1'=—0.9 V. Its operation is based on a differential amplifier (formed by BJTs with coupled 
emitters), having on the right side a fixed input, Vprp, and on the left side the input signals proper. 
In this example, it suffices to have one of the inputs high to turn the transistor on the right OFF, so its 
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y=atb 


Veer =—1.3V 


Vee=—-5.2V 


FIGURE 10.8. OR gate constructed with traditional ECL logic. 


collector voltage grows, turning the output transistor ON, which in turn raises the output voltage (hence 
y=atb). 

ECL attains high speed by not allowing the transistors to ever go into saturation, thus making the 
switching from one logic state to the other much faster (gate delays well under 1ns). However, it con- 
sumes even more power than TTL, and the circuits again require large silicon areas (for BJTs and resistors). 
Additionally, because of its low signal excursion (0.8 V), a small noise margin results (~0.25 V). Applica- 
tion examples for this architecture (with some circuit variations) will be presented in Chapter 13 when 
examining the construction of very high speed registers. 


10.5 MOS-Based Logic Families 


The next part of this chapter (Sections 10.6 to 10.8) describes logic circuits constructed with MOSFETs 
(metal oxide semiconductor field effect transistors). 

MOS-based logic circuits occupy much less space than BJT-based circuits and, more importantly, can 
operate with virtually no static power consumption (CMOS family). Moreover, aggressive downsizing 
has continuously improved their performance. For instance, as mentioned in Section 9.2, MOS tech- 
nology is often referred to by using the smallest transistor dimension that can be fabricated (shortest 
transistor channel, for example), whose evolution can be summarized as follows: 8 4m (1971), ...,0.18 wm 
(2000), 0.13 wm (2002), 90nm (2004), and 65nm (2006). The 45nm technology is expected to be shipped 
in 2008. 

To enhance performance even further, copper (which has lower resistivity) is used instead of alumi- 
num to construct the internal wires in top-performance chips. Moreover, the addition of SiGe to obtain 
strained transistors and the use of SOI implementations (both described in Section 9.8) have further 
improved their high-frequency behavior. 

The following static MOS-based architectures are described in this chapter: 


m CMOS logic 
m Pseudo-nMOS logic 
m@ Transmission-gate logic 


m BiCMOS logic (mixed) 
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The following dynamic MOS-based architectures are also included: 
mg Dynamic logic 
# Domino logic 


& CMOS logic 


10.6 CMOS Logic 


CMOS stands for complementary MOS because for each nMOS transistor there also is a pMOS transistor. 
The most fundamental attribute of this arrangement is that it allows the construction of digital circuits 
with the smallest power consumption of all digital architectures. 


10.6.1 CMOS Circuits 


CMOS logic was introduced in Chapter 4, in which all gates (inverter, AND, NAND, OR, NOR, etc.) 
were illustrated using the corresponding CMOS circuit. 

In particular, the CMOS inverter was examined in detail in Sections 4.2, 9.5, and 9.6. More specifically, 
its operation, power consumption, and timing diagrams were described in Section 4.2; its construction, 
main parameters, DC response, and transition voltage were presented in Section 9.5; and finally, its 
transient response was shown in Section 9.6. 

As a brief review, three of the CMOS circuits studied in Chapter 4 are shown in Figure 10.9, which 
contains a CMOS inverter, a 3-input NAND/AND gate, and a 3-input NOR/OR gate. Other CMOS 
circuits will be seen later, like the construction of registers, in Chapter 13. 


10.6.2 HC and HCT CMOS Families 


We describe next two 5V CMOS families, called HC and HCT, both employed in the 74/54-series of 
digital ICs. 


V V V 


a x=(at+b+c)’ 
aol b =9-b- a 
x So-yx | pew a-b:c Dyolpe- y=atb+c 
x=(a-b-c)’ c 


(a) (b) (c) 


FIGURE 10.9. CMOS gates: (a) Inverter; (b) 3-input NAND/AND; (c) 3-input NOR/OR. 
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HC and HCT CMOS families (JEDEC JESD7A standard) 
Parameter Symbol Test condition Value for Value for Unit 
(all @25°C) HC HCT 
Nominal supply voltage Vop 5+0.5 5+0.5 Vv 
Minimum input high voltage Vin 0.7Vpp 2 Vv 
Maximum input low voltage Vit 0.2Vpp 0.8 Vv 
Minimum output high voltage Vou lo=-20pA Vpp-0.1 Vpp-0.1 Vv 
lo=—4mA Vpp-0.52 | Vpp-0.52 Vv 
Maximum output low voltage VoL lo=20pA 0.1 0.1 V 
(CMOS loads, TTL loads) lo=4mA 0.26 0.26 Vv 
Maximum input high current lin Vi= Vop 1 1 pA 
Maximum input low current lie Vi = OV =4 =i pA 
Fan-out (LS loads) 10 10 -- 
Typical gate delay tp CL=15pF 8 10 ns 
Typical power consumption P f=0Hz <10 <10 Ww 
per gate f= 100kHz 80 80 LW 


FIGURE 10.10. Main parameters of the HC and HCT CMOS families employed in the 74/54-series of digital ICs. 


The HC family is the CMOS counterpart of the TTL family. Like TTL, its nominal supply voltage is 
5V (though it can operate anywhere in the 2V-6V range). However, not all of its input voltages are 
compatible with TTL, so HC and TTL gates cannot be mixed directly. To cope with this limitation, a TTL- 
compatible version of HC, called HCT, also exists. The main parameters of these two families are listed 
in Figure 10.10, extracted from the JEDEC JESD7A standard of August 1986. 

Compared to TTL, these families exhibit approximately the same speed but lower power con- 
sumption and wider output voltages ('0' is closer to GND and '1' is closer to Vpp than in any TTL 
version). Moreover, the input current is practically zero, allowing many CMOS gates to be driven by 
a single gate. 

Besides HC and HCT, there are also other, less popular 5V CMOS versions. However, 5V logic is 
nearly obsolete, with modern designs employing almost exclusively LVCMOS (low-voltage CMOS) and 
other low-voltage standards (described in Section 10.9), so 5V chips are currently used almost only as 
replacement parts. 


10.6.3 CMOS-TTL Interface 


The output of an HC or HCT chip can be connected directly to any TTL input because the correspond- 
ing voltages are compatible (observe in Figure 10.10 that Voz; and Vo, of CMOS fall within the allowed 
ranges of Vj; and V;, for TTL, shown in Figure 10.5) and so is the fan-out (=10LS loads). On the other 
hand, a TTL output cannot be connected directly to an HC input because one the output voltages (Vo};) 
of the former falls in the forbidden range of the latter (note in Figure 10.5 that Vo,, of TTL can be as low 
as 2.4V, while HC requires at least 3.5 V for a high input). Consequently, if TTL and HC must be intercon- 
nected, then a voltage shifter must be employed (for example, an open-collector buffer with a pull-up 
resistor or a specially designed level translator). Another solution is the replacement of HC with HCT, in 
which case full compatibility occurs. 
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10.6.4 Fan-In and Fan-Out 


The definitions of fan-in and fan-out were seen in Section 10.3 (see Figure 10.6). As mentioned there, 
LS TTL (Figure 10.5) is used as the standard load for fan-out comparison. Because an LS input can sink 
20 vA or source 0.4mA, while an HC/HCT output can source or sink 4mA, the HC/HCT’s fan-out is 10 
(which is the smallest of 4/0.4 and 4/0.02). 


10.6.5 Supply Voltage, Signal Voltages, and Noise Margin 


The supply voltage and maximum/minimum allowed signal voltages constitute another important set of 
parameters for any logic family because they determine the power consumption and the noise margins. 
For example, it was seen in Section 4.2 that the dynamic power consumption of a CMOS inverter is given 
by Equation 4.6, so any change of Vpp highly impacts the power consumption. 

It was also seen in Section 10.3 that the noise margins when low and when high are given by 
NM, = Vi. - Vor and NMy= Voy Viz, respectively. For the HG family, these calculations are illustrated 
in Figure 10.11(a), where gate A communicates with gate B. The bar on the left shows the voltage ranges 
that can be produced by A, while the bar on the right shows the voltage ranges that are acceptable to B. 
The nominal supply voltage is Vpp=5 V, and the worst case (4mA) allowed maximum/minimum signal 
voltages are Vo, = 0.26 V, Voy =4.48 V, Vi =1V, and Vyy=3.5 V. Consequently, NM; =1-—0.26=0.74V and 
NMy=4.48 —3.5=0.98 V result, which are listed in Figure 10.11(b). 

Note that in Figure 10.11(a) it was assumed that gate A is sourcing/sinking a high current to/from 
gate B, so the output voltages are relatively far from GND and Vpp (observe in Figure 10.10 that two sets 
of values were defined for Vo, and Voy; one for low current, the other for high current). In case B were a 
CMOS gate (or several CMOS gates), A would operate with very small currents, so Vo; and Voy would 
be at most 0.1 V far from the rail voltages, yielding higher noise margins. 


10.6.6 Low-Voltage CMOS 


As mentioned above, modern designs employ almost exclusively low-voltage CMOS circuits. Even 
though their general architecture is still that seen in Chapters 4, 9, and again in this chapter, the sup- 
ply voltage was reduced from 5V to 3.3V, then to 2.5 V, 1.8 V, and 1.5 V. Recently, it was further reduced 


\~> => 5V HC @4mA, 25°C 
Vop=5V 


| Output 


VoH=4.48V 
Input 


Noise margin 
0.74V 


(b) 


Vor =0.26V 


@4mA, 25°C 


(a) 


FIGURE 10.11. (a) Voltage ranges for the HC CMOS family; (b) Corresponding noise margins. 
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to 1.2V and, in state of the art chips (like top-performance FPGAs, described in Chapter 18), it is just 
1V. Corresponding I/O standards were needed to interface with chips from these families, which are 
described in Section 10.9. 


10.6.7 Power Consumption 


Power consumption is a crucial specification in modern digital systems, particularly for portable 
(battery-operated) devices. A discussion on the power consumption (both static and dynamic) of 
CMOS logic is included in Section 4.2. 


10.6.8 Power-Delay Product 


The power-delay (PD) product is another important measure of circuit performance because it combines 
information related to the circuit’s speed with its corresponding power consumption. A discussion on 
the PD product of a CMOS inverter, extensive to CMOS logic in general, is also included in Section 4.2. 


10.7 Other Static MOS Architectures 


CMOS is the most popular logic architecture because of its practically zero static power consumption. 
However, there are applications (like gates with large fan-ins, shown later in the construction of memories) 
where other approaches are more appropriate. 

In addition to CMOS, six other MOS-based architectures will be introduced in this chapter. Three of 
them are static, like CMOS, while the other three are dynamic (clocked). The first group is described in 
this section, while the second is seen in the next. These six architectures are listed below. 


m@ Static: Pseudo-nMOS logic, transmission-gate logic, and BiCMOS logic 
m Dynamic: Dynamic logic, domino logic, and C*MOS logic 


10.7.1 Pseudo-nMOS Logic 


Two gates constructed with pseudo-nMOS logic are depicted in Figure 10.12. The circuit in (a) is a 
4-input NAND/AND, while that in (b) is a 4-input NOR/OR. 


a (weak) 

y=a-b-e-d Es; (weak) 
a> x=(a-b-c-d)’ La y=a+b+c+d 
all a C C: x x=(atb+c+d)' 
c- 
d—] 


(a) (b) 


FIGURE 10.12. (a) NAND/AND and (b) NOR/OR gates constructed with pseudo-nMOS logic. 
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To construct a pseudo-nMOS gate, we simply replace all pMOS transistors in the corresponding 
CMOS circuit with just one pMOS whose gate is permanently connected to ground (thus always ON). 
The pMOS acts as a pull-up transistor and must be “weak” compared to the nMOS transistors (that is, 
must have a small channel width-to-length ratio, W/L—Sections 9.2-9.3), such that whenever nMOS 
and pMOS compete, the former wins. 

Note that the nMOS part of Figure 10.12(a) is similar to the nMOS part of the circuit in Figure 10.9(b), 
that is, the transistors are connected in series, which characterizes a NAND/AND gate (Section 4.3). 
Likewise, the nMOS part of Figure 10.12(b) is similar to the nMOS part of Figure 10.9(c), that is, the tran- 
sistors are connected in parallel, which characterizes a NOR/OR gate (Section 4.4). 

The main advantages of this approach are its reduced circuit size (~50% smaller than CMOS) and the 
possibility of constructing NOR/OR gates with a large fan-in (because there are no pilled transistors). 
On the other hand, the static power consumption is no longer zero (note in Figure 10.12(b) that it suffices 
to have one input high for static current to flow from Vpp to ground). Moreover, the rising transition can 
be slow if the parasitic capacitance at the pull-up node is high (because the pull-up transistor is weak). 
The same can occur with the falling transition because of the contention between the pMOS and nMOS 
transistors. This contention also prevents the output voltage from reaching 0 V, so true rail-to-rail opera- 
tion is not possible. 


10.7.2 Transmission-Gate Logic 


Figure 10.13(a) shows a switch implemented using a single MOSFET. This switch (called pass transistor) 
has two main problems: Poor '1' and slow upward transition. Suppose that our circuit operates with 
3.3 V (='1') and that V;=0.6 V (where V; is the transistor’s threshold voltage—Sections 9.2-9.3). When 
sw='l1' (=3.3V) occurs, the transistor is turned ON, so if x='1' the output capacitor is charged toward 
3.3 V. However, this voltage cannot grow above Vpp- V;=2.7 V (poor '1') because at this point the tran- 
sistor is turned OFF. Additionally, as the voltage approaches 2.7V, the current decreases (because Veg 
decreases—see Equation 9.2), slowing down the final part of the transition from '0' to '1'. 

The circuit of Figure 10.13(b), known as transmission gate (TG), solves both problems presented by the 
pass transistor. It consists of a CMOS switch, that is, a switch constructed with an nMOS plus a pMOS 
transistor. While the former has a poor '1' and a slow upward transition, the latter exhibits a poor '0' and 
a slow downward transition. However, because both transistors are turned ON at the same time and 
operate in parallel, no matter in which direction the transition is one of them guarantees the proper logic 
level as well as a fast transition. Two TG symbols are shown in Figure 10.13(c). In the upper symbol, the 
switch closes when sw='1' (as in Figure 10.13(b)), while in the other it closes when sw='0'. 


sw’ sel 
sw’ i—- y aa 
ep: SW - 
eg eee gE 
sw rm i SH ay b 
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(a) (b) (c) (d) 


FIGURE 10.13. (a) Pass transistor; (b) Transmission gate (TG); (c) TG symbols; (d) A multiplexer implemented 
with TGs. 
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(a) (b) 


FIGURE 10.14. Digital BiCMOS gates: (a) Inverter; (b) NAND. 


With TGs plus inverters, any logic gate can be constructed. An application example is depicted in 
Figure 10.13(d), in which two TGs are used to implement a multiplexer (y=a when sel='0' or y=b if 
sel='1'; in other words, y=sel'-a+sel-b). 


10.7.3 BiCMOS Logic 


BiCMOS (bipolar +CMOS) logic gates should not be confused with BiCMOS technology (seen in Section 9.8). 
BiCMOS logic is an application of BiCMOS technology because it combines BJTs and MOSFETs in the same 
chip. The use of BiCMOS gates is desirable, for example, when the circuit must deliver high output currents 
(to drive large capacitive loads or feed high-current buses), because for high currents BJTs can be smaller and 
faster than MOSFETs. 

Two BiCMOS gates are depicted in Figure 10.14. In Figure 10.14(a), an inverter is shown, where the 
small inverter unit connected to the upper BJT’s base is a CMOS inverter (Figure 10.9(a)). In Figure 10.14(b), 
a NAND gate is presented, where the small NAND unit connected to the upper BJT is a CMOS NAND 
(Figure 10.9(b)). Notice that in both cases the BJTs appear only at the output and are all of npn type (faster 
than pnp). 

Like TTL, these circuits exhibit a poor '1' because the base-emitter junction voltage (~0.7V) of the 
upper BJT limits the output voltage to Vpp—0.7 V. This poor '1' was acceptable in old designs (Vpp=5 V), 
but is a problem in present low-power designs (Vpp = 2.5 V). For that reason, BJTs are being replaced with 
deep-submicron MOSFETs, which are now competitive even in relatively high-current applications. 


10.8 Dynamic MOS Architectures 


We introduce now the last three MOS-based architectures. These are distinct from the others in the sense 
that they are dynamic, hence they are controlled by a clock signal. 


10.8.1 Dynamic Logic 


Two basic dynamic gates (NAND and NOR) are shown in Figure 10.15(a). Note that these circuits are 
similar to the pseudo-nMOS gates seen in Figure 10.12, except for the fact that the pull-up transistor is 
now clocked (and strong) instead of being permanently connected to ground. 

These circuits operate in two phases, called precharge (for clock='0') and evaluation (for clock='1'). 
When clock='0', the pull-up transistor is turned ON, which, being strong, precharges the output node (y) 
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FIGURE 10.15. (a) Dynamic NAND and NOR gates; (b) Dynamic footed NAND and NOR gates. 


toa voltage near Vpp. Next, clock='1' occurs, turning the pMOS transistor OFF, so y is conditionally (it 
depends on the inputs) discharged to GND. 

The advantages of this approach over pseudo-nMOS are a smaller static power consumption and a 
faster upward transition. The former is due to the fact that the pull-up transistor now only remains ON 
while clock ='0'. The latter is due to the fact that the pMOS transistor is now strong (large W/L). 

The same two gates are shown in Figure 10.15(b), but now with both power-supply connections 
clocked. This is called footed dynamic logic and exhibits even faster output transitions and essentially 
zero static power consumption. On the other hand, the circuit is slightly more complex and clock loading 
is also higher. 

The dynamic gates of Figure 10.15 present a major problem, though, when the output of one gate 
must be connected to the input of another, solved with the use of domino logic (described below). 


10.8.2 Domino Logic 


The dynamic gates of Figure 10.15 present a major problem when the output of one gate must be con- 
nected to the input of another. This is due to the fact that during the evaluation phase the inputs must 
never change from high to low because this transition is not perceived (recall from Chapter 9 that an 
nMOS transistor is turned ON when a positive voltage is applied to its gate). This is called monotonicity 
condition (the inputs are required to be monotonically rising). 

A simple way of solving the monotonicity problem is with a static (CMOS) inverter placed between 
dynamic stages, as shown in Figure 10.16. Because during the precharge phase the output voltage of 
any dynamic gate is elevated by the pull-up transistor, the output of the inverter will necessarily be low. 
Therefore, during evaluation, this signal can only remain low or change from low to high, so the undesir- 
able high to low transition is prevented. 
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Domino OR Domino AND 


clock -d 


static c _ 


inverter 


FIGURE 10.16. Domino logic (static inverters are placed between dynamic stages; the latter can be footed or 
unfooted). 


clock ‘-q 
clock + 


FIGURE 10.17. Dynamic D-type latch constructed with C?MOS logic. 


Such architecture is called domino and is depicted in Figure 10.16 for a domino OR gate followed by a 
domino AND gate (the dynamic parts of the domino gates can be either footed or unfooted). Other varia- 
tions of domino also exist, like NP, NORA, and Zipper domino, which, due to major drawbacks, never 
enjoyed much success in commercial chips. 


10.8.3 Clocked-CMOS (C?MOS) Logic 


Still another dynamic architecture is the so-called C?MOS logic. In it, a pair of clocked transistors is 
connected directly to the output node, driving it into a high-impedance (floating) state every time 
the transistors are turned OFF (this type of logic was seen in the construction of tri-state buffers in 
Section 4.8). 

A dynamic D-type latch constructed with C?MOS logic is depicted in Figure 10.17 (D latches will be 
studied in Section 13.3). When clock='1', both clocked transistors are ON, causing the circuit to behave 
as a regular CMOS inverter (the latch is said to be “transparent” in this situation because any change 
in d is seen by g). However, when clock='0', both transistors are turned OFF, causing node q to float, so 
changes in d do not disturb g (the latch is said to be “opaque”). Typically, this situation can only last a 
few milliseconds because the parasitic capacitor (which is responsible for storing the data bit) is gener- 
ally very small. Note that the circuit of Figure 10.17 is similar to that of a 3-state buffer (see Figure 4.18), 
but to operate as a latch, the output node cannot be shared with other circuits (that is, no other circuit is 
allowed to feed that node). 
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10.9 Modern I/O Standards 


Having seen the main architectures used inside digital ICs, we turn now to the discussion on how such circuits 
can be accessed. A set of rules specifying how the input/output accesses can be done is called an I/O standard. 

There are several I/O standards for communicating with integrated circuits, which vary in supply 
voltage, allowed signal ranges, and maximum speed, among other factors. For general applications at 
relatively low speeds, the most common are TTL, LVTTL, CMOS, and LVCMOS (where LV stands for 
low voltage). For higher speeds and more specific applications, more complex standards exist, like SSTL, 
HSTL, LVPECL, and LVDS. 

Even though the TTL and CMOS I/O standards were derived from the respective 5V families 
described earlier, it is important not to confuse I/O standards with logic families. When it is said that a 
certain chip complies with the TTL I/O standard, it does not mean that the chip is constructed with TTL 
circuits but simply that its input/output electrical parameters are compatible with those defined for the 
TTL family. In other words, logic family refers to the internal physical circuits, while I/O standard regards 
how such circuits can be accessed. 

In summary, a chip can contain any of the logic architectures described earlier (TTL, CMOS, pseudo- 
nMOS, domino, etc.), which are in principle independent from the I/O type chosen to access them. 
A good example is the set of I/Os used in state of the art CPLD/FPGA chips (Chapter 18), which include 
basically all I/Os that will be described here, among them TTL and LVTTL, though the internal circuits 
are constructed using only MOS transistors. 

The I/O standards presented in this section are listed in Figure 10.18 along with typical application 
examples. As can be seen, they are divided into three categories, as follows: 


Name Nominal Vopo (V) Application example 


5 General purpose 
5 
LVCMOS 3.3, 2.5, 1.8, 1.5, 1.2, 1.0 


Single-ended voltage-referenced terminated standards: 
SSTL_3 3:3 


SSTL_2 2.5 Memory interface (DDR SDRAM) 


SSTL_18 : Memory interface (DDR2 SDRAM) 


Memory interface (DDR3 SDRAM) 
F Memory interface (QDR2 SRAM) 
1.5 Memory interface (QDR2 SRAM) 


Diff. SSTL_3 
Diff. SSTL_2 A Memory interface (DDR SDRAM) 

Diff. SSTL_18 - Memory interface (DDR2 SDRAM) 
Diff. SSTL_15 : Memory interface (DDR3 SDRAM) 
Diff. HSTL-18 : SRAM memory and clock interface 
Diff. HSTL-15 : SRAM memory and clock interface 
LVDS and M-LVDS Chip-to-chip communication and buses 


FIGURE 10.18. List of modern (except TTL and CMOS) I/O standards described in this section. 
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m@ Single-ended standards: One input, one output, with no reference voltage or termination 
resistors. 


m@ Single-ended voltage-referenced terminated standards: One input, one output, with termination 
resistors to reduce reflections at high speeds due to impedance mismatch (printed circuit board 
traces are treated as transmission lines). The input stage is a differential amplifier, so a reference 
voltage is needed. 


m Differential standards: Two inputs, two outputs (complemented), with termination resistors (with 
the purpose described above). The input stage is again a differential amplifier, but now with 
both inputs used for signals (then the reference voltage is no longer needed). These standards are 
designed to operate with low voltage swings (generally well under 1 V) to achieve higher speeds 
and cause low electromagnetic interference. Additionally, by being differential, they exhibit higher 
noise immunity. 


10.9.1 TTL and LVTTL Standards 


TTL 


This is an old single-ended 5 V I/O standard for general-purpose applications at relatively low speeds 
(typically under 100 MHz). Its voltage/current specifications were borrowed from the standard TTL 
family, described in Section 10.3. However, as already mentioned, if an integrated circuit requires the 
TTLI/O standard to communicate it does not mean that its internal circuitry employs bipolar transistors 
or TTL architecture. This I/O’s main parameters are summarized in Figure 10.19. (TTL and CMOS are 
the only old I/Os still sometimes used in modern designs.) 


3.3V LVTTL 


The 3.3V LVITL (low-voltage TTL) is another single-ended I/O standard for general applications at 
relatively low speeds (typically under 200 MHz). Like CMOS, the input driver is generally a CMOS 
inverter, while the output driver is aCMOS push-pull or inverter circuit (depicted in Figure 10.20, which 
shows an output of IC1 to be connected to an input of IC2). 

The main parameters for the 3.3 V LVTTLI/O are summarized in Figure 10.21. As shown in the figure, 
they were defined by the JEDEC JESD8C standard of September 1999, last revised in June 2006. 


Parameter Symbol Test condition Value for Value for Unit 
(all @25°C) TTL CMOS 


Nominal supply voltage 
Minimum input high voltage 
Maximum input low voltage 


Minimum output high voltage 


Maximum output low voltage 


Fan-out (LS loads) 


FIGURE 10.19. Main parameters of TTL and CMOS |/O standards. 
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Ic1 Voo=3.3V Voxssv = 2 | 


FIGURE 10.20. Typical output and input circuits for CMOS, LVTTL, and LVCMOS I/Os. 


3.3V LVTTL (JEDEC JESD8C standard, Sept/99, rev. June/06) 
Parameter Symbol Test condition 
Normal supply voltage Vop 
Minimum input high voltage Vin 
Maximum input low voltage Vit 
Minimum output high voltage Vou Vpp= min lo=-2mA 
Maximum output low voltage VoL Vpp= min | lo=2mA 
Maximum input high current hin Vi=Vpp 
Maximum input low current lit 


FIGURE 10.21. Main parameters for the standard 3.3V LVTTL I/O. 


10.9.2 CMOS and LVCMOS Standards 


CMOS 

Like TTL, this is an old 5V I/O standard for general-purpose applications at relatively low speeds. Its 
voltage/current specifications were borrowed from the HC CMOS family, described in Section 10.6 and 
repeated in Figure 10.19 (defined in the JEDEC JESD7A standard of August 1986). 


3.3V LVCMOS 

3.3 V LVCMOS (low-voltage CMOS) is another single-ended I/O standard whose applications are simi- 
lar to those of 3.3 V LVTTL, just with different I/O signal ranges. The internal circuits are also similar to 
those in Figure 10.20. The main parameters for this I/O, defined in the same JEDEC standard as 3.3 V 
LVTTL, are summarized in Figure 10.22. 


2.5V LVCMOS 

This I/O is an evolution of 3.3V LVCMOS, again single-ended and for the same types of applications, 
but operating with 2.5 V instead of 3.3 V. The internal circuits are similar to those in Figure 10.20. The 
main parameters for this I/O, defined in the JEDEC JESD8-5A standard of October 1995 and last revised 
in June 2006, are summarized in Figure 10.23. 
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3.3V LVCMOS (JEDEC JESD8C standard, Sept/99, rev. June/06) 
Parameter Symbol Test condition 


Normal supply voltage Vop 
Minimum input high voltage Vin 
Maximum input low voltage Vit 


Minimum output high voltage Vou Vpp= min 


FIGURE 10.22. Main parameters for the standard 3.3 V LVCMOS I/O. 


2.5V LVCMOS (JEDEC JESD8-5A standard, Oct/95, rev. June/06) 


Parameter Symbol Test condition Value Unit 
[Maximum inputiowvottge | Va ———S—S~dCi 
2.1 


Vit 
Minimum output high voltage Vou lo=—-0.1mMA 
lo=-1mA 2.0 


lo=—-2mA 1.7 


Maximum output low voltage lo=0.1mA 0.2 
lo=1mA 0.4 
lo=2mA 0.7 


FIGURE 10.23. Main parameters for the standard 2.5V LVCMOS I/O. 


a 


1.8V LVCMOS 

This I/O is an evolution of 2.5V LVCMOS, again single-ended and for the same types of applications, 
but operating with 1.8 V instead of 2.5V. The internal circuits are similar to those in Figure 10.20. The 
main parameters for this I/O, defined in the JEDEC JESD8-7A standard of February 1997 and last revised 
in June 2006, are summarized in Figure 10.24. 


1.5V LVCMOS 

This I/O is an evolution of 1.8V LVCMOS, again single-ended and for the same types of applications, 
but operating with 1.5V instead of 1.8V. The internal circuits are similar to those in Figure 10.20. The 
main parameters for this I/O, defined in the JEDEC JESD8-11A standard of October 2000 and last revised 
in November 2005, are summarized in Figure 10.25. 
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1.8V LVCMOS (JEDEC JESD8-7A standard, Feb/97, rev. June/06) 
Symbol | Test condition Value 
Normal supply voltage 1.8+0.15 


Unit 
Vv 


V 
Minimum output high voltage Vv lo=-0.1mMA Vpp-0.2 


lo=-2mA Vpp-0.45 


Maximum output low voltage lo=0.1mMA 0.2 
lo=2mA 0.45 


Minimum input high voltage Vin 0.65Vpp 
Maximum input low voltage Vit 0.35Vpp 
VoL 
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FIGURE 10.24. Main parameters for the standard 1.8V LVCMOS I/O. 


1.5V LVCMOS (JEDEC JESD8-11A standard, Oct/00, rev. Nov/05) 
Parameter 
Minimum input high voltage | vw | | 0.65Vp0 
Maximum input low voltage | vw | «| 0.35Vp0 


Minimum output high voltage Vou lo=—-0.1mA Vpp-0.2 
lo=-2mA 0.75Vpp 


Maximum output low voltage Voi lo=0.1mMA 0.2 
lo=2mA 0.25Vpp 


Unit 


[oa < 


FIGURE 10.25. Main parameters for the standard 1.5V LVCMOS I/O. 


1.2V LVCMOS (JEDEC JESD8-12A standard, May/01, rev. Nov/05) 


Parameter Symbol | Test condition 


1.2+0.1 
0.35Vpp 


Minimum output high voltage 


VoL 


eed Le) 
| | 65¥00_ |_| 
a Le | 


lo=-0.1mMA Vpp-0.1 

son | time | | 

Maximum output low voltage ee) Ilo=0.1mMA 
lo=2mA 0.25Vpp 


Vv 
Vv 
Vv 
Vv 
Vv 
Vv 


FIGURE 10.26. Main parameters for the standard 1.2V LVCMOS I/O. 


1.2V LVCMOS 


This I/O is an evolution of 1.5V LVCMOS, again single-ended and for the same types of applications, 
but operating with 1.2 V instead of 1.5V. The internal circuits are similar to those in Figure 10.20. The 
main parameters for this I/O, defined in the JEDEC JESD8-12A standard of May 2001 and last revised in 


November 2005, are summarized in Figure 10.26. 
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1V LVCMOS 


This I/O is an evolution of 1.2V LVCMOS, again single-ended and for the same types of applications, 
but operating with 1 V instead of 1.2 V. The internal circuits are similar to those in Figure 10.20. The main 
parameters for this I/O, defined in the JEDEC JESD8-14A standard of December 2001 and last revised in 
November 2005, are summarized in Figure 10.27. 


10.9.3 SSTL Standards 


SSTL (stub series terminated logic) is a technology-independent I/O standard for high-speed applica- 
tions (400 Mbps to 1.6Gbps), used mainly for interfacing with memory ICs (SSTL_2 for DDR, SSTL_18 
for DDR2, and SSTL_15 for DDR3 SDRAM—these memories are described in Chapter 16). 

The first difference between this I/O and LVCMOS can be observed in Figure 10.28 where the output 
and input buffers are shown. The former is still a push-pull or CMOS inverter, but the latter is a differen- 
tial amplifier (note that there are two input transistors, which have their source terminals tied together 
to a current source). This circuit exhibits high voltage gain and also propitiates the use of two inputs. 
Indeed, the output of this circuit is not determined by the absolute value of any of the inputs but rather 
by the difference between them (that is why it is called differential amplifier). If the positive input (indi- 
cated with a “+” sign) is higher, then the output voltage is high; otherwise, it is low. When operating in 
differential mode, both inputs are connected to incoming wires, while when operating in single-ended 
mode the negative input is connected to a reference voltage. 


1V LVCMOS (JEDEC JESD8-14A standard, Dec/01, rev. Nov/05) 
Parameter Symbol | Test condition Value Unit 
Normal supply voltage Vpp 1+ 0.1 Vv 
Minimum input high voltage Vin 0.65Vpp V 
Maximum input low voltage Vit | 0.35Vpp Vv 
Minimum output high voltage Vou | lo=-0.1mMA Vpp-0.1 Vv 
lo=-2mA 0.75Vpp i 
Maximum output low voltage Voi lo=0.1mA 0.1 V 
lo=2mA 0.25Vop Vv 


FIGURE 10.27. Main parameters for the standard 1V LVCMOS I/O. 


| Ic1 Vooo IC2 


FIGURE 10.28. Typical output and input circuits for differential I/Os (SSTL, HSTL, LVDS, etc.). 
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Another fundamental difference is that this I/O employs a supply voltage (normally called Vppo—see 
Figure 10.28) that is independent from that of the IC (so is technology independent), giving the designer 
more flexibility. Moreover, it includes termination resistors to reduce reflections in the PCB lines, improv- 
ing high speed operation. As mentioned above, it allows single-ended as well as differential operation. 

Depending on how the termination resistors are connected, the circuit is classified as class I or class II. 
This is shown in Figure 10.29. The class I circuit has a series resistor at the transmitting end and a par- 
allel resistor at the receiving end, while the class II circuit includes an additional parallel resistor at 
the transmitting end (these are just typical arrangements; other variations exist in the corresponding 
JEDEC standards). Note that the parallel termination resistors are connected to a termination voltage, 
Vip which equals Vppp. 

Another particularity of this I/O is its set of specifications regarding the electrical parameters at the 
receiving end. As illustrated in Figure 10.30, two sets of values are defined for Viyz/Vy,, called “de” and 
“ac” values. According to the corresponding JEDEC standards (JESD8-8, -9B, and —15A), the signal 
should exhibit a minimum slope of 1V/ns over the whole “ac” range, and should switch state when 
the “ac” value is crossed, given that the signal remains above the “dc” threshold. This definition is a 
little confusing because it might suggest the use of two hysteresis ranges when only one would prob- 
ably do and also because there is a subtle but important difference in this definition between standards. 


(d) Differential class II 


FIGURE 10.29. Typical usage of SSTL I/Os (single-ended and differential, classes | and Il). 
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Vobo 


Vitiac) 
Viride) 


Vrer 


Vitiae) 
Vitiac) 


ov 


FIGURE 10.30. 


Input signals to the SSTL I/O buffer. 


Parameter 
Buffer supply voltage 
Reference voltage 
Termination voltage 
Minimum DC input high voltage 
Maximum DC input low voltage 


Minimum output high voltage 
Maximum output low voltage 
Minimum output current 


Maximum output low voltage 


Minimum output current 


SSTL_3 (JEDEC JESD8-8 standard, Aug/96) 


Symbol 
VREF aan ke ae 
Ver | er 
Vine) | | Vrer0.2 | 
Vue) | |e | 


Maximum AC input low voltage ie 


Sm 
a * | eee | +8 mA 


Minimum output high voltage 
Class Il 


lon, c= 


FIGURE 10.31. 


Consequently, it might occur that actual implementations are done slightly different, using, for 


Main parameters for the standard SSTL_3 I/O. 


example, only one hysteresis range and considering only the “dc” values. 


As mentioned above, SSTL is another I/O standardized by JEDEC. It is presented in three versions, 
called SSTL_3, SSTL_2, and SSTL_18, whose main electrical parameters are summarized below. 


SSTL_3 


This was the first SSTLI/O. It operates with Vpp9 =3.3 V and is specified in the JEDEC JESD8-8 standard 


of August 1996. Its main parameters are shown in Figure 10.31. 


SSTL_2 


This was the second SSTL I/O. It operates with Vppo=2.5 V and is commonly used for interfacing 
with DDR SDRAM memory chips (Chapter 16) operating with a 200 MHz clock, transferring data at 
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SSTL_2 (JEDEC JESD8-9B standard, May/02, rev. Oct/02) 
Unit 


Vit(de) | Vrer—-0.15 
Viti(ac) Vrert+0.31 


V 


Maximum DC input low voltage Vit(de) | Vrer-0.125 


Minimum AC input high voltage Vit(ac) Vrert+0.25 
P ,  ———— 
Maximum AC input low voltage Vit(ac) Vrer-0.25 


Vv 
V 


FIGURE 10.33. Main parameters for the standard SSTL_18 I/O. 


both clock edges (400 MTps—see Figure 16.13). Its main parameters, specified in the JEDEC JESD8-9B 
standard of May 2002 and revised in October 2002, are shown in Figure 10.32. 


SSTL_18 

This was the third SSTL I/O. It operates with Vpp9=1.8V and is commonly used for interfacing with 
DDR2 SDRAM memory chips (Chapter 16) operating with a 400 MHz clock, transferring data at both 
clock edges (800 MTps—see Figure 16.13). Its main parameters, specified in the JEDEC JESD8-15A 
standard of September 2003, are shown in Figure 10.33. 
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SSTL_15 


Even though this standard has not been completed yet, the semiconductor industry is already 
using it for interfacing with DDR3 SDRAM memory chips (Chapter 16). This circuit operates 
with Vpp9=1.5 V and an 800 MHz clock, transferring data at both clock edges (1.6 GT ps—see 
Figure 16.13). 

One final remark regarding the termination resistors shown in Figure 10.29 is that they were ini- 
tially installed off-chip, which became increasingly problematic in FPGA-based designs because of the 
large number of I/O pins. For that reason, OCT (on-chip termination) is used in modern FPGAs with 
the additional and very important feature of being automatically calibrated (this technique is called 
DCI (digitally controlled impedance)), thus optimizing the impedance matching between the circuit’s 
output and the transmission line for reduced signal reflection, hence improving the high-frequency 
performance. 


10.9.4 HSTL Standards 


HSTL (high-speed transceiver logic) is very similar to SSTL (they were developed at almost the same 
time). Its main application is for interfacing with QDR SRAM chips (described in Chapter 16). 

Originally, only HSTL-15 was defined (for Vpp9=1.5V) through the JEDEC JESD8-6 standard of 
August 1995. However, at least two other versions, called HSTL-18 (for Vpp9=1.8V) and HSTL-12 (for 
Vppo = 1.2 V), are in use. 

Like SSTL, HSTL is technology independent (Vppo separated from Vpp), specifies “de” and “ac” sig- 
nal values, allows single-ended and differential operation, and is intended for high-frequency applica- 
tions (over 200 MHz). 

Like SSTL, it employs termination resistors to reduce transmission-line reflections, hence 
improving high-frequency performance. Depending on how these resistors are connected, four 
classes of operation are defined, which are summarized in Figure 10.34 (only single-ended mode 
is shown in the figure; for differential mode, the same reasoning of Figures 10.29(c)-(d) can be 
adopted). 


10.9.5 LVDS Standard 


LVDS (low-voltage differential signaling) is a popular differential I/O standard for general-purpose 
high-speed applications (for distances up to ~10m). It is specified in the TIA/EIA-644-A (Telecommuni- 
cations Industry Association/Electronic Industries Alliance no. 644-A) standard. 

Some of the LVDS parameters are shown in Figure 10.35. In Figure 10.35(a), an LVDS link is depicted, 
which consists of a driver-receiver pair, operating in differential mode and having a 100) resistor 
installed at the receiving end. The driver’s output waveform is depicted in Figure 10.35(b), showing 
the allowed offset voltage Vog=1.25 V + 10% and the differential output voltage 0.247 V = Vop = 0.454 V 
(nominal =0.35 V). In Figure 10.35(c), corresponding receiver specifications are shown, saying that the 
receiver must be capable of detecting a differential voltage V;p as low as 100mV over the whole 0V to 
2.4V range. 

Because the voltage swing is small (as seen above, the TIA/EIA-644-A standard defines a nominal 
differential voltage Vop=0.35 V), the driver is fast and produces a negligible electromagnetic field (low 
EMI). Moreover, because the signaling is differential, it is highly immune to noise. 

LVDS is one of the lowest-power I/Os in its category. To attain a nominal differential voltage 
Vop= 9.35 V over a 100 resistor, 3.5mA must be delivered by the LVDS driver, which results in just 
1.23 mW of power consumption. 


10.9 Modern I/O Standards 


(a) Single-ended class | 
0.75V 


(b) Single-ended class II 


(c) Single-ended class III 
1.5V 1.5V 


(d) Single-ended class IV 


245 


FIGURE 10.34. Typical usage of HSTL I/Os (only single-ended options shown). 


(a) 


Vos=1.25V+10% V,=0.1V to 2.3V 


(b) Driver output (c) Receiver input 


FIGURE 10.35. (a) LVDS link; (b) Driver waveform; (c) Corresponding receiver specifications. 
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1002 


(a) Point-to-point simplex 


(d) Multipoint (half-duplex) 


FIGURE 10.36. LVDS and M-LVDS configurations. 


This is also a fast I/O. The specified minimum rise/fall time is 0.26ns, which gives a theoretical 
maximum speed of 1/(2 x 0.26)=1.923 Gbps (for example, LVDS I/Os operating at 1.25Gbps have been 
available in modern FPGAs for some time). However, the enormous technological evolution since this 
limit (0.26ns) was established in the 1990s, combined with other versions of LVDS that were tailored for 
particular applications (like Bus LVDS and others), already led to speeds over 2 Gbps. 

Finally, Figure 10.36 shows the four LVDS operating modes, called point-to-point simplex (unidirectional, 
like that in Figure 10.35(a)), point-to-point half-duplex (bidirectional, but one direction at a time), multidrop 
(simplex with several receivers), and multipoint (half-duplex with several driver-receiver pairs). 

The last configuration (Figure 10.36(d)) is officially called M-LVDS (multipoint LVDS), and it is defined 
in a separate standard called TIA/EIA-899. It specifies, for the driver, a higher current, a higher differen- 
tial voltage (0.48 V = Vop = 0.65 V over 50), and a much higher tolerance to offsets (0.3 V = Vog=2.1V). 
For the receiver, it specifies a higher sensitivity (50mV) and also a wider range of operation (—1.4V to 
3.8 V instead of 0 to 2.4V). However, M-LVDS is slower (~500 Mbps) than point-to-point LVDS. 

To conclude this section, a high-speed application where LVDS is employed is described. 


10.9.6 LVDS Example: PCI Express Bus 


PCI Express (peripheral computer interface express—PCle) is an I/O standard for personal computers, 
employed for backplane communication between the CPU and expansion cards (video, memory, etc.). 
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It was introduced by Intel in 2004 and is much faster than previous PCI versions, so it is rapidly replacing 
PCI, PCI-X, and AGP (accelerated graphics port, for video cards) interfaces. 

Each PCle link is composed of 1, 4, 8, 16, or 32 lanes. A single lane is depicted in Figure 10.37, which 
consists of two pairs of wires for data, each operating in unidirectional mode using the differential sig- 
naling standard LVDS described above. The data rate in each direction is 2.5Gbps. However, the data 
are encoded using the 8B/10B code described in Chapter 6, thus the net bit rate is 2.5 x8/10=2Gbps. 
With 1 up to 32 lanes, the net bit rate in a PCle link can go from 2Gbps up to 64 Gbps. Version 2.0 of PCle, 
released in 2007, doubles its speed, from 2.5Gbps to 5Gbps in each lane, hence the net transfer rate can 
be as high as 128 Gbps. 

As an illustration, Figure 10.38 shows two PCle boards, the first with one lane (top left) and the second 
with 16 lanes (top right). The 16-lane option is very common for video boards. At the bottom left, four 
PCle connectors are shown for 1, 4, 8, and 16 lanes (36 to 166 pins), while at the bottom right a conven- 
tional PCI connector (120 pins) is depicted for comparison. 


~<t— 2.5Gbps 

fie lane (2 LVDS pairs, 
8B/10B encoded, 

2.5Gbps in each direction) | 


FIGURE 10.37. PCle lane (2.5 Gbps in each direction, 8B/10B encoded, net rate of 2 Gbps). 


FIGURE 10.38. PCI Express boards and motherboard connectors. One-lane data-acquisition card shown on 
top left, and a popular 16-lane video card on top right. PCI express connectors (36 to 166 pins) shown on bot- 
tom left, and the standard PCI (120 pins), for comparison, on bottom right. 
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10.10 Exercises 


1. Logic circuit #1 
The questions below refer to the circuit of Figure E10.1. 
a. To which logic family does the part within the dark box belong? 
b. To which logic family does the overall circuit belong? 


c. Analyze the circuit and write down its truth table. What type of gate is this? 


FIGURE E10.1. 


2. Logic circuit #2 
The questions below refer to the circuit of Figure E10.2. 
a. To which logic family does the part within the dark box belong? 
b. To which logic family does the overall circuit belong? 


c. Analyze the circuit and write down its truth table. What type of gate is this? 


FIGURE E10.2. 


3. Logic circuit #3 
The questions below refer to the circuit of Figure E10.3. 
a. To which logic family does the part within the dark box belong? 
b. To which logic family does the overall circuit belong? 


c. Analyze the circuit and write down its truth table. What type of gate is this? 
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FIGURE E10.3. 


4. 54-series 
a. What are the differences between the 54- and 74-series? 
b. Is LS TTL available in both series? 
c. Are HC and HCT CMOS available in both series? 
5. 74-series #1 
a. What ICs from the 74-series contain 4-input NOR gates? 
b. And inverters with hysteresis? 
c. And D-type flip-flops? 
d. And 4-bit synchronous counters? 
6. 74-series #2 


Which ICs from the 74-series are needed to implement the circuit of Figure E10.6? 


FIGURE E10.6. 


7. In-out voltages 


In the specifications of any logic family, input and output voltage-related parameters are always 
included. What are the meanings of the parameters Vy, Vip, Vor, and Vo, (used, for example, in the 
TTL, LVITL, CMOS, and LVCMOS families)? 


8. In-out currents 


a. Similarly to the question above, what are the meanings of the current-related parameters lj, It, 
Io,, and Ip,? 


b. Two of them are always negative. Which and why? 
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9. 


10. 


11. 


12. 


13. 


14. 


15. 


Fan-out #1 


Suppose that the ICs of a certain logic family exhibit f; =—0.1mA, Iyy=0.2mA, Ip, =1.2mA, and 
Ioy=—1.6mA. 


a. Draw a diagram similar to that in Figure 10.6. 

b. What is the fan-out of this logic family (with respect to itself)? 
c. What is its fan-out with respect to LS TTL? 

Fan-out #2 


Suppose that the ICs of a certain logic family exhibit fy; =—0.2mA, Iy7=0.1mA, Io, =1.2mA, and 
Toy =—1.6mA. 


a. Draw a diagram similar to that in Figure 10.6. 

b. What is the fan-out of this logic family (with respect to itself)? 

c. What is its fan-out with respect to LS TTL? 

Noise margin 

Suppose that a certain logic family exhibits Vy =1V, Vpqy=3.8 V, Vo, =0.3 V, and Voy=4.7 V. 
a. Draw a diagram similar to that in Figure 10.7. 

b. Calculate this family’s noise margin when low and when high. 

3.3V LVCMOS 


a. Draw a diagram similar to that in Figure 10.7 for the 3.3 V LVCMOS I/O standard. Assume that 
Vpp is exactly 3.3 V. 


b. Calculate this family’s noise margin when low and when high. 
2.5V LVCMOS 


a. Draw a diagram similar to that in Figure 10.7 for the 2.5 V LVCMOS I/O standard. Assume that 
Vpp is exactly 2.5 V and that Ip5=|1mA|. 


b. Calculate this family’s noise margin when low and when high. 
1.8V LVCMOS 


a. Draw a diagram similar to that in Figure 10.7 for the 1.8 V LVCMOS I/O standard. Assume that 
Vpp is exactly 1.8 V and that Ip5=|2mA]. 


b. Calculate this family’s noise margin when low and when high. 
1.5V LVCMOS 


a. Draw a diagram similar to that in Figure 10.7 for the 1.5 V LVCMOS I/O standard. Assume that 
Vpp is exactly 1.5 V and that Ip5=|2mA|. 


b. Calculate this family’s noise margin when low and when high. 
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16. 1.2V LVCMOS 


a. Draw a diagram similar to that in Figure 10.7 for the 1.2V LVCMOS I/O standard. Assume that 
Vpp is exactly 1.2 V and that n= |2mA]. 


b. Calculate this family’s noise margin when low and when high. 
17. 1V LVCMOS 


a. Draw a diagram similar to that in Figure 10.7 for the 1V LVCMOS I/O standard. Assume that 
Vpp is exactly 1 V and that I5=|2mA]. 


b. Calculate this family’s noise margin when low and when high. 
18. Loaded LVCMOS gate #1 


Figure E10.18 shows a 2.5V LVCMOS gate. In (a), it feeds a pull-up load R,, while in (b) it feeds 
a pull-down load R,. Assume that Vpp is exactly 2.5 V. Using the parameters given for this I/O in 
Figure 10.23, answer the questions below. 


a. Estimate the minimum value of R, in (a) such that the voltage at node y, when y='0', is not 
higher than 0.4 V. 


b. Estimate the minimum value of R; in (b) such that the voltage at node y, when y='1', is not lower 
than 2V. 


c. In (a) and (b) above is it necessary to establish a limit for the maximum value of R;? Explain. 


2.5V LVCMOS 


(a) ¥ (b) Ri | I 
2.5V LVCMOS 


FIGURE E10.18. 


19. Loaded LVCMOS gate #2 


The questions below refer to the circuits seen in Figure E10.18, which can be answered using the 
parameters given for the 2.5V LVCMOS I/O in Figure 10.23. If RL =3.3kQ is employed in both 
circuits, estimate the voltage (or voltage range) of node y in the following cases: 


a. For circuit (a) when y='0'. 
b. For circuit (a) when y='1'. 
c. For circuit (b) when y='0'. 
d. For circuit (b) when y='1'. 
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20. NOR gate 
Draw the MOS-based circuit for a 3-input NOR gate using: 
a. CMOS logic 
b. Pseudo-nMOS logic 
c. Dynamic footed logic 
21. XOR gate 
Draw the MOS-based circuit for a 2-input XOR gate using: 
a. CMOS logic 
b. Pseudo-nMOS logic 
c. Dynamic footed logic 
22. XNOR gate 
Draw the MOS-based circuit for a 2-input XOR gate using: 
a. CMOS logic 
b. Pseudo-nMOS logic 
c. Dynamic footed logic 
23. AND gate with 3-state output 


Using CMOS and C?MOS logic, draw an AND gate with tri-state output. (Suggestion: 
see Section 4.8.) 


24. NAND gate with 3-state output 


Using CMOS and C’?MOS logic, draw a NAND gate with tri-state output. (Suggestion: 
see Section 4.8.) 


25. XOR with TGS 

Using transmission-gate (TG) logic, draw a circuit for a 2-input XOR gate. 
26. XNOR with TGS 

Using transmission-gate (TG) logic, draw a circuit for a 2-input XNOR gate. 
27. CMOS inverter 

If not done yet, solve the following exercises relative to the CMOS inverter: 

a. Exercise 9.14 

b. Exercise 9.15 

c. Exercise 9.16 

d. Exercise 9.17 


e. Exercise 9.18 
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28. nMOS inverter 


29. 


30. 


31. 


If not done yet, solve the following exercises relative to the nMOS inverter: 


a. 
b. 


Cc. 


Exercise 9.19 
Exercise 9.20 


Exercise 9.21 


Pseudo-nMOS NOR #1 


Consider the N-input pseudo-nMOS NOR gate depicted in Figure E10.29, and say that M (0O=M=N) 
represents the number of inputs that are high. 


a. 


Discuss: Why do the nMOS transistors that are ON always operate in the triode mode? And 
why can the pMOS transistor operate in either (saturation or triode) mode? 


When the pMOS transistor operates in triode mode, prove that the output voltage is given by 
the expression below, where k=MB,/B, 


Vy ={K(Vop — Vin) + Vay —{1k(Vop — Vin) + Vip? — (k= 1)(Vpp +2 V4) Vopt 2k 1) (10.3) 
When the pMOS transistor operates in saturation, prove that 
Vy=Voo— Vin- [Yop — Van)? — (Von + Vip)/K1 (10.4) 


Pseudo-nMOS NOR # 2 


The questions below regard again the circuit of Figure E10.29. 


a. Intuitively, what voltage do you expect at node y when x, =%)=...=Xy='0'? 


d. 


Say that N=4 and that all transistors have the same size (recall from chapter 9 that in this case 
B,~3B,). Consider also that V;,=0.6V, Vz,=—0.7V, and Vpp=5V. Calculate the output voltage 
(V,) for M=0,M=1, M=2, M=3, and M=4 using both expressions given in the previous exercise. 


Comment on the results above: Did they match your expectations? In each case, is the pMOS 
transistor actually operating in triode or saturation? Why are the results from the two expres- 
sions almost alike? 


Why cannot V, ever be 0V? 


Pseudo-nMOS NOR # 3 


Suppose that you are asked to design the circuit gate of Figure E10.29 such that the worst-case 
output voltage is V, =1V. 


FIGURE E10.29. 


254 CHAPTER 10 Logic Families and I/Os 


a. 


What is the worst case? That is, for what value of M should you develop the design? In this case, 
in which mode do the nMOS and pMOS transistors operate? 


Determine the relationship B,,/8, needed to fulfill the specification above (V,=1V). Use the 
same circuit and transistor parameters listed in the previous exercise. 


32. Pseudo-nMOS NAND #1 


Consider the N-input pseudo-nMOS NAND gate depicted in Figure E10.32. Prove that the voltage 
at node y is given by the same two equations of Exercise 10.29 but with k=6,,/NB,. 


FIGURE E10.32. 


33. Pseudo-nMOS NAND #2 


34. 


35. 


The questions below regard again the circuit of Figure E10.32. 


a. Intuitively, what voltage do you expect at node y when at least one input is '0'? 


d. 


Say that N=4 and that (W/L),=3(W/ L),, (recall from chapter 9 that in this case B,~ 9B,,). 
Consider also that V7, =0.6 V, V7, =-0.7 V, and Vpp=5 V. Calculate the output voltage (V,) using 
both expressions seen in the previous exercise. 


Comment on the results above. Did they match your expectations? In which regime (saturation 
or triode) do the nMOS and pMOS transistors actually operate? 


Why cannot V, ever be 0V? 


Pseudo-nMOS NAND #3 


Determine the relationship B,,/,, needed to guarantee that V, =1V in the pseudo-nMOS NAND gate 
of Figure E10.32 when all inputs are high (use the same parameters listed in the previous exercise). 


SSTL 


Check the documentation of at least one of the following JEDEC standards for SSTL I/Os and 
compare the values given there against those listed in the figures of Section 10.9. 


a. JESD8-8 standard (for SSTL_3, Figure 10.31) 
b. JESD8-9B standard (for SSTL_2, Figure 10.32) 


Cc. 


JESD8-15A standard (for SSTL_18, Figure 10.33) 
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36. 


37. 


HSTL 


Check the documentation of the JEDEC standards JESD8-6, which defines the HSTL-15 I/O, and do 
the following: 


a. Construct a table for it similar to what was done for SSTL in Figures 10.31-10.33. 
b. Check in it the four operating modes depicted in Figure 10.34. 


c. Draw a figure similar to Figure 10.34 but for the differential versions of HSTL instead of single 
ended. 


LVDS 


a. Examine the TIA/EIA-644-A standard, which specifies the LVDS I/O. Check in it the informa- 
tion given in Figure 10.35. 


b. Examine also the TIA/EIA-899 standard, for M-LVDS, and write down its main differences with 
respect to regular LVDS. 
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Combinational Logic 
Circuits 


Objective: Chapters 1 to 10 introduced fundamental concepts, indispensable devices, basic circuits, 
and several applications of digital electronics. From this chapter on, we focus exclusively on circuit 
analysis, design, and simulation. 

Digital circuits can be divided into two main groups, called combinational and sequential. The former 
is further divided into logical and arithmetic, depending on the type of function (i.e., logical or arithmetic) 
that the circuit implements. We start by studying combinational logic circuits in this chapter, proceeding to 
combinational arithmetic circuits in the next, and then sequential circuits in the chapters that follow. This 
type of design (combinational logic) will be further illustrated using VHDL in Chapter 20. 


Chapter Contents 


11.1. Combinational versus Sequential Logic 
11.2 Logical versus Arithmetic Circuits 
11.3. Fundamental Logic Gates 

11.4 Compound Gates 

11.5 Encoders and Decoders 

11.6 Multiplexer 

11.7 Parity Detector 

11.8 Priority Encoder 

11.9 Binary Sorter 

11.10 Shifters 

11.11 Nonoverlapping Clock Generators 
11.12 Short-Pulse Generators 

11.13 Schmitt Triggers 

11.14 Memories 

11.15 Exercises 

11.16 Exercises with VHDL 

11.17 Exercises with SPICE 


11.1 Combinational versus Sequential Logic 


By definition, a combinational logic circuit is one in which the outputs depend solely on its current 
inputs. Thus the system is memoryless and has no feedback loops, as in the model of Figure 11.1(a). In 
contrast, a sequential logic circuit is one in which the output does depend on previous system states, so 
storage elements are necessary, as well as a clock signal that is responsible for controlling the system 
evolution. In this case, the system can be modeled as in Figure 11.1(b), where a feedback loop, containing 
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logic 


Combinational 


input output 


logic 
Storage 
elements 


Feedback loop 


(a) (b) (c) 


FIGURE 11.1. (a) Combinational logic and (b) sequential logic models; (c) The feedback loop converts the 
circuit from combinational to sequential (D latch). 


the memory elements, can be observed. For example, without feedback, the circuit of Figure 11.1(c) 
would be purely combinational, but the presence of such a loop converts it into a sequential circuit 
(a D-type latch, Section 13.3), because now its output does depend on previous states. 

It is important to observe, however, that not all circuits that possess storage capability are sequential. 
ROM memories (Chapter 17) are good examples. From the memory-read point of view, they are combi- 
national circuits because a memory access is not affected by previous memory accesses. 


11.2 Logical versus Arithmetic Circuits 


The study of combinational circuits is separated into two parts, called combinational logic circuits (Chapter 11) 
and combinational arithmetic circuits (Chapter 12). 

As the name says, the first of these types implements logical functions, like AND, OR, XOR, multiplexers, 
address encoders/decoders, parity detectors, barrel shifters, etc., while the second implements arithmetic 
functions, like adders, subtracters, multipliers, and dividers. In an actual design it is important that the 
designer understand in which of these areas every major section of a project falls because distinct analysis 
and implementation approaches can be adopted. 


11.3 Fundamental Logic Gates 


Fundamental logic gates were already described in Chapters 4 and 10. As a review, they are summarized 
in Figure 11.2, which shows their symbols and some MOS-based implementations. 


m@ Inverter: y=x' (CMOS circuit included in Figure 11.2) 


m Buffer: y=x (if noninverting, two inverters in series can be used to implement it with the second 
stage often stronger to provide the required I/O parameters) 


m@ Switches: y=sw'-Z+sw-a (pass-transistor (PT) and transmission-gate (TG) switches are shown in 
Figure 11.2) 


m Tri-state buffer: y=ena'-Z+ena-a (two implementations are shown in Figure 11.2, with TG-logic 
and C?7MOS logic; note that the former is inverting) 


m NAND: y=(a-b)' (CMOS circuit shown in Figure 11.2) 
m AND: y=a-b (this isa NAND gate followed by an inverter) 
m NOR: y=(a+b)' (CMOS circuit shown in Figure 11.2) 
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FIGURE 11.2. Fundamental logic gates (symbols followed by examples of MOS-based implementations). 


m@ OR: y=a+b (this is a NOR gate followed by an inverter) 


m XOR: y=aQ@b=a'-b+a-b' (two implementations are presented in Figure 11.2, one with CMOS 
logic, the other with TG-logic) 


m XNOR: y=(a@b)'=a'-b'+a-b (again, two implementations are included in Figure 11.2, one with 
CMOS logic, the other with TG-logic) 


11.4 Compound Gates 


Having reviewed the fundamental logic gates, we now examine the constructions of larger gates. 
Because they combine functions performed by different basic gates (notably AND, OR, and inversion), 
they are referred to as compound gates. 

It was seen in Chapter 5 that any logic function can be expressed as a sum-of-products (SOP), like 
y=a-b+c-d, or as a product-of-sums (POS), like y=(a+b)-(c+d). In this section we show how to draw a 
circuit for each of these expressions. 
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y=(atb)-(c+d) 


y’=a"-b’ + c'd’ 
y=abted a 


y'=(a'+b’)-(c’+d’) 
~— AND c4 


(b) 


FIGURE 11.3. (a) SOP-based CMOS circuit for y=a-b+c-d; (b) POS-based CMOS circuit for y=(a+b)-(c+d). 


11.4.1 SOP-Based CMOS Circuit 


Given a Boolean function in SOP form, each term is computed using an AND-like circuit (transistors in 
series) in the lower (nMOS) side, and an OR-like circuit (transistors in parallel) in the upper (pMOS) side 
of the corresponding CMOS architecture. All AND branches must be connected in parallel, and all OR 
sections must be in series. Moreover, the output needs to be inverted. 

This procedure is illustrated in Figure 11.3(a) for the function y=a-b+c-d. Note that its inverted ver- 
sion, y’=(a'+b’)-(c'+d'), which is also produced by the circuit, is indeed the POS implementation for 
the inverted inputs. 


11.4.2 POS-Based CMOS Circuit 


Given a Boolean function in POS form, each term is computed using an AND-like circuit (transistors in 
series) in the upper (pMOS) side, and an OR-like circuit (transistors in parallel) in the lower (nMOS) side 
of the corresponding CMOS architecture. All AND branches must be connected in parallel, and all OR 
sections must be connected in series. Moreover, the output needs to be inverted. 

This procedure is illustrated in Figure 11.3(b) for the function y=(a+b)-(c+d). Note that its inverted 
version, y’=a'-b' +c'-d', which is also produced by the circuit, is indeed the SOP implementation for the 
inverted inputs. 


MM EXAMPLE 11.1 SOP-BASED CMOS CIRCUIT 


Draw CMOS circuits that implement the following Boolean functions (note that they are all in SOP 
format): 


a. y=a-b-c+d-e 
b. y=a'-b+a-b' 


c. y=atb-c 
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SOLUTION 


The circuits, implemented using the first of the two procedures described above, are shown in Figure 11.4. 
Note that in Figure 11.4(b) the inverter was not employed because y’ =a’ -b’ +a-b was implemented instead 
(y is indeed the XOR function—compare this circuit to the XOR circuit in Figure 11.2). 


(c) 


FIGURE 11.4. CMOS implementations for (a) y=a-b-c+d-e, (b) y=a’-b+a-b’, and (c) y=atb<c. 


EXAMPLE 11.2 POS-BASED CMOS CIRCUIT 


Draw CMOS circuits that implement the following Boolean functions (note that they are all in POS 
format): 


a. y=(a+b+c)-(dte) 
b. y=(a'+b)-(a+b’) 
c. y=a-(b+c) 
SOLUTION 


The circuits, implemented using the second of the two procedures described above, are 
shown in Figure 11.5. Note in Figure 11.5(b) that again the inverter was not employed because 
y'=(a' +b')- (a+b) was implemented instead (y is the XNOR function). Comparing the CMOS circuits 
of Figure 11.5 with those in Figure 11.4, we observe that, as expected, they are vertically reflected. 


a? 


(c) 


FIGURE 11.5. CMOS implementations for (a) y=(a+b+0)-(d+e), (b) y=(a’ +b) -(a+b’), and (c) y=a-(b+0). 
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od (weak) 


FIGURE 11.6. Implementations of y=a-b-c+d-e with (a) pseudo-nMOS logic and (b) footed domino logic. 


The migration of any design from CMOS to other MOS architectures seen in Sections 10.7-10.8 is 
straightforward. As an example, the circuit of Figure 11.4(a) was converted from CMOS to pseudo- 
nMOS logic (Section 10.7) in Figure 11.6(a), obtained by simply replacing all pMOS transistors with just 
one weak pMOS whose gate is permanently connected to ground. The same circuit was also converted 
to footed domino logic (Section 10.8) in Figure 11.6(b), in which both connections to the power supply 
rails are clocked. 


11.5 Encoders and Decoders 


The encoders /decoders treated in this section are simply parallel logic translators, having no relationship 
with other encoders /decoders described earlier (like those for line codes, seen in Chapter 6). 


11.5.1 Address Decoder 


One of the most common decoders in this category is called address decoder because it is employed in 
memory chips (Chapters 16 and 17) to activate the word line corresponding to the received address 
vector. This type of circuit converts an N-bit input into a 2-bit output, as shown in the three equivalent 
symbols of Figure 11.7(a), with the output having only one bit different from all the others (either low 
or high). 

The address decoder’s operation can be observed in the truth table of Figure 11.7(b) for the case of 
N=3 and with the dissimilar bit high (this type of code is called one-hot and will be seen in the implemen- 
tation of finite state machines in Chapters 15 and 23). As can be observed in the truth table, the input is 
the address of the dissimilar bit with address zero assigned to the rightmost bit. 

Four address decoder implementations, for N=2, are depicted in Figure 11.8. From the truth table in 
Figure 11.8(a), the following Boolean expressions are obtained for y: 


Yo=X" +X’ (SOP) or Yo=(X; +X)’ (POS) 
Yj, =X," + Xq (SOP) or y, =(X, + Xp’)’ (POS) 
Yy=X1+ Xo’ (SOP) or y, =(X%' + Xp)’ (POS) 


Y3 =X, + Xp (SOP) or yy =(x,' + Xp’)’ (POS) 
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The circuit shown in Figure 11.8(b) is a direct implementation of the SOP expressions listed above 
using AND gates. In Figure 11.8(c), a CMOS architecture is depicted, which was obtained using the POS 
equations and the procedure described in Section 11.4 (notice that some of the transistors are shared 
between adjacent branches), thus resulting in a NOR gate in each column. In Figure 11.8(d), pseudo- 
nMOS logic was employed instead of CMOS, again based on the POS expressions (so each column is 
still a NOR gate). Finally, in Figure 11.8(e), footed dynamic logic was employed, and the implementation 
was based on the SOP expressions instead of POS, thus resulting in NAND gates in the columns (in this 
case, the dissimilar bit is low instead of high). 


Ee ae 
Yo Sacod | 000 | 00000001 | 
: eas Sead as 
x y . 010 | 00000100 
Ye |_011 | 00001000 | 

Xn - Ps 
Yon X(N-1:0) ===j © Decoder y(2"—-1:0) [410 | 01000000 | 
[111 | 10000000 _| 


(a) (b) 


FIGURE 11.7. (a) Address decoder symbols; (b) Truth table for N=3, with the dissimilar bit equal to '1' (“one-hot” 
code). 


Xy 


Xy". 


y3 Y2 yi Yo (d) 


FIGURE 11.8. Address decoder implementations for N=2: (a) Truth table; (b) SOP-based implementation 
with AND gates; (c) POS-based CMOS implementation (columns are NOR gates); (d) POS-based pseudo-nMOS 
implementation (again columns are NOR gates); (e) SOP-based footed dynamic implementation (columns are 
NAND gates). The dissimilar bit is high in all circuits except in (e). 
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11.5.2 Address Decoder with Enable 


Figure 11.9(a) shows an address decoder with an output-enable (ena) port. As described in the truth 
table, the circuit works as a regular decoder while ena='1', but turns all outputs low when ena='0', 
regardless of the input values. 

This case, with N=2, can be easily designed using Karnaugh maps to find the Boolean expressions for 
the outputs. Such maps are shown in Figure 11.9(b), from which the following equations result: 


y3= ena: X, “Xo 
Yo = NA+ X, + Xo! 
Y, = eNna-X,'- Xp 


Yo= ena X,'+ Xo! 


An implementation for these equations using AND gates is depicted in Figure 11.9. The translation 
from this diagram to any of the transistor-level schematics of Figure 11.8 is straightforward. 


11.5.3 Large Address Decoders 


We describe now the construction of larger address decoders from smaller ones. Suppose that we want to 
build the decoder of Figure 11.10(a), which has N=4 inputs and, consequently, 2’ =16 outputs, and that 
we have 2-bit decoders (Figure 11.9(a)) available. 

A solution is depicted in Figure 11.10(b), where four decoders are used to provide the 16-bit output. 
The same two bits (x,Xq) are fed to all of them. However, a fifth decoder, receiving the other two inputs 
(x3X2), controls their enable ports. Because only one of the four decoders will be enabled at a time, the 
proper output signals are produced. 


[ena | xix0_| yayayivo 


Y3 = eNa-X;"Xo Yo = eNa-X"Xo" Ys = eNarx;"*Xo Yo = ena-x;'*Xo’ 


FIGURE 11.9. Address decoder with enable. 
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FIGURE 11.10. 4-bit address decoder constructed using 2-bit decoders. 


Note that if only decoders without enable were available, a layer of AND gates at the output would 
also do. This, as will be shown in the next section, is equivalent to using a multiplexer to select which 
decoder should actually be connected to the output (Exercises 11.14-11.15). 


11.5.4 Timing Diagrams 


When a circuit is operating near its maximum frequency, the inclusion of internal propagation delays in 
the analysis is fundamental. Timing diagrams for that purpose were introduced in Section 4.2, with three 
simplified styles depicted in Figure 4.8. In the example below, one of such diagrams is employed in the 
timing analysis of an address decoder. 


MM EXAMPLE 11.3 ADDRESS-DECODER TIMING DIAGRAM 


Figure 11.11 shows a 2-bit address decoder (borrowed from Figure 11.8). Given the signals x, 
and x) shown in the upper two plots, draw the resulting waveforms at the outputs. Adopt the 
simplified timing diagram style seen in Figure 4.8(b), with the following propagation delays 
in the inverters and AND gates: ft, jjy=1ns, tf, anp=2ns. Assume that the vertical lines are 
1ns apart and observe that x, and x) do not change exactly at the same time, which is indeed 
the case in real circuits. 
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FIGURE 11.11. Timing diagram for a 2-bit address decoder implemented with conventional gates. 


SOLUTION 


The solution is included in Figure 11.11. Note that, to make it easier to follow, plots for x,' and x,’ 
were also included. As can be observed, only one output is high at a time. However, depending on 
the specific implementation and its propagation delays, glitches might occur during the transitions 
(which are generally acceptable in this type of circuit). Ml 


11.5.5 Address Encoder 


An address encoder does precisely the opposite of what an address decoder does, that is, it converts a 
2-bit input that contains only one dissimilar bit into an N-bit output that encodes the position (address) 
of the dissimilar bit. Figure 11.12 shows three equivalent address encoder symbols plus the truth table 
for N=2 and also an implementation example (for N=2) using OR gates. 

Many other parallel encoders/decoders exist besides the address encoder/decoder. An example is 
given below. 


* Encoder 
x y yi 
xy Yo 3s N 
X2 y 
sa Yo 


at x(2"-1:0) a Encoder y(N-1:0) X3 X2 X1 Xo 


Xon-1 


(a) (b) (c) 


FIGURE 11.12. (a) Address encoder symbols; (b) Truth table for N=2; (c) Implementation example with OR gates. 


MM EXAMPLE 11.4 SSD DECODER 


Figure 11.13(a) shows a seven-segment display (SSD), often used to display BCD-encoded numeric 
digits from 0 to 9 and also other characters. Two common technologies employed in their fabrication are 
LEDs (light emitting diodes) and LCD (liquid crystal display). The segments have one end in common, 
as illustrated for LED-based SSDs in Figures 11.13(b)-(c). In the common-cathode case, the cathode is 
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common anode 


. | . (b) common cathode (c) 
8 - —— . : 
d h input output (7 bits) output (8 bits) 
ABCD decimal abcdefg decimal abcdefgh decimal 
(a) 11111400 __252 
0110000096 
0010 2 1101101 109 11011010 218 
11110010 242 
‘ BCD-to-SSD 0100 4 0110011 51 01100110 102 
input converter output [79101 5 1011011 91 10110110 182 
: 10111110 190 


11100000224 
11114110 254 
(d) (e) 1001 9 1111011 123 11110110 246 
others 10-15 don't care don't care 


FIGURE 11.13. (a) Seven-segment display (SSD); (b) Common-cathode configuration; (c) Common-anode 
configuration; (d) BCD-to-SSD converter symbol; (e) Truth table for common-cathode decoder. 


normally connected to ground, so a segment is turned ON when the bit feeding it is '1'. In the common- 
anode configuration, the anode is normally connected to Vpp, so the opposite happens, that is, a seg- 
ment is turned ON when its corresponding bit is '0' (inverted logic). In this example, we are interested in 
using SSDs to display the output of a BCD counter (decimal digits from 0 to 9). Each digit is represented 
by 4 bits, while an SSD requires 7 bits to drive a digit (or 8 if the decimal point is also used). Therefore, 
a BCD-to-SSD converter (also called SSD driver or SSD decoder) is needed. A symbol for such a converter 
appears in Figure 11.13(d), and the corresponding truth table, for positive logic (that is, common-cathode) 
in Figure 11.13(e). Design a decoder to perform the BCD-to-SSD conversion. Recall that Karnaugh maps 
can be helpful to obtain an optimal (irreducible) SOP or POS expression for the segments (4, , ..., 2). 


SOLUTION 


The input bits are represented by ABCD in the truth table, and the output bits by abcdefg. For each 
output bit, a corresponding Karnaugh map is shown in Figure 11.14, from which we obtain the 
following equations: 


a=A+C+B-D+B'-D’ 
b=B'+C-D+C'-D' 
c=(A'-B'-C-D')' 
d=A+B'-C+B'-D'+C-D'+B-C'-D 
e=B'.D'+C-D' 
f=A+B-C'+B-D'+C'-D' 
g=A+B-C'+B'-C+C-D' 


An AND-OR implementation for each of these expressions is shown along with the Karnaugh 
maps in Figure 11.14. Another implementation, using only NAND gates, is shown at the bottom of 
Figure 11.14. 
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FIGURE 11.14. BCD-to-SSD converter of Example 11.4 (for common-cathode display). Oo 


11.6 Multiplexer 


Multiplexers are very popular circuits for data manipulation. They act as switches that allow multiple 
choices of data paths. 
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a 
(a) i b 
sel 

a a 

b oy b 

(b) | ¢ c 

d d 
sel(1:0) 


FIGURE 11.15. (a) Two-input multiplexer with one bit per input (2x 1 mux); (b) Four-input multiplexer with 
one bit per input (4x1 mux). Each figure shows the multiplexer’s symbol, a conceptual circuit implemented 
with switches, and the corresponding truth table. 


11.6.1 Basic Multiplexers 


The symbol for a single-bit two-input multiplexer (thus called 2x1 mux) is presented on the left of 
Figure 11.15(a), followed by a conceptual circuit constructed with switches, and finally the correspond- 
ing truth table. The circuit has two main inputs (a, b), plus an input-select port (sel), and one output (y). 
When se/='0', the upper input is connected to the output, so y=a; otherwise, if sel='l', then y=b. The 
corresponding Boolean function then is y=sel’-a+sel-b. 

Another multiplexer is depicted in Figure 11.15(b), this time with four single-bit inputs (4 x 1 mux). 
A conceptual circuit, implemented with switches, is again shown, followed by the circuit’s truth table. 
As depicted in the figure, y=a occurs when sel="00" (decimal 0), y=b when sel="01" (decimal 1), y=c 
when sel="10" (decimal 2), and finally y=d when sel="11" (decimal 3). The corresponding Boolean 
function then is y=sel,’-selp’-a+sel,'-sely-b+sel,-sely'-c+sel,-sely-d, where sel, and sely are the two 
bits of sel. 

When a multiplexer is implemented with switches, like in the central part of Figure 11.15, TGs (trans- 
mission gates, Figure 11.2) are normally employed. 


MM EXAMPLE 11.5 NAND-BASED MULTIPLEXER 
Draw a circuit that implements the 2 x 1 multiplexer of Figure 11.15(a) using regular NAND gates. 


SOLUTION 


We saw in Section 5.5 that any SOP equation can be implemented by a two-layer circuit containing 
only NAND gates (see Figure 5.8), and possibly inverters. The mux’s equation, in SOP format, was 
already seen to be y=sel’-a+sel-b. Therefore, the circuit of Figure 11.16(a) results. 
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FIGURE 11.16. 2x1 multiplexer implemented with (a) NAND gates, (b) TGs, and (c) PTs. 


EXAMPLE 11.6 TG- AND PT-BASED MULTIPLEXER 


Draw circuits that implement the 2 x 1 multiplexer of Figure 11.15(a) using TGs (transmission gates) 
and PTs (pass-transistors). 


SOLUTION 


TG and PT switches were seen in Figure 11.2. The corresponding multiplexers are depicted in 
Figures 11.16(b) and (c), respectively. Mi 


11.6.2 Large Multiplexers 


We illustrate now the construction of larger multiplexers from smaller ones. Two cases arise: (i) with more 
bits per input or (ii) with more inputs (or both, of course). Both cases are depicted in the examples below. 


MM EXAMPLE 11.7 MUX WITH LARGER INPUTS 


Suppose that we want to build the 2x3 mux (with 2 inputs of 3 bits each) of Figure 11.17(a), and we 
have 2x 1 muxes available. Show a solution for this problem. 


a2 
Y2 


b2 
a(2:0) a 
y(2:0) val 
b(2:0) by 
sel ie 
(a) (b) Yo 
bo 


sel 


FIGURE 11.17. 2x3 mux constructed with 2x 1 muxes. 
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SOLUTION 


A 2x3 mux is shown in Figure 11.17(b), constructed with three 2 x 1 muxes associated in parallel. 
Note that because each unit has only two inputs, sel is still a single-bit signal. (A common, simplified 
representation for the input-select line was employed in the figure, running under all multiplexers 
but obviously connected only to the sel port of each unit.) 


EXAMPLE 11.8 MUX WITH MORE INPUTS 


Suppose thatnow we want to build the 4 x 1 mux (with 4 inputs of 1 bit each) shown in Figure 11.18(a), 
and we have 2x1 muxes available. Show a solution for this problem. 


Qo» @ 
< 


sel(1:0) sel} 


(a) (b) selo 


FIGURE 11.18. 4x1 mux constructed with 2x 1 muxes. 


SOLUTION 


The 4x1 mux is shown in Figure 11.18(b), constructed with two 2 x1 muxes in parallel and one in 
series. Note that now the inputs are still single-bit, but sel is multibit, with its LSB controlling the first 
layer of muxes and the MSB controlling the second layer. 


11.6.3 Timing Diagrams 


As mentioned earlier, when a circuit is operating near its maximum frequency, the inclusion of internal 
propagation delays in the analysis is fundamental. Timing diagrams for that purpose were introduced 
in Section 4.2, with three simplified styles depicted in Figure 4.8. In the example below, two of such dia- 
grams are employed in the functional and timing analysis of a multiplexer. 


MM EXAMPLE 11.9 MULTIPLEXER FUNCTIONAL ANALYSIS 


Figure 11.19(a) shows a 2x1 multiplexer (seen in Figure 11.15(a)), to which the first three stimuli 
plotted in Figure 11.19(b) are applied. Assuming that the circuit is operating in low frequency so the 
internal propagation delays are negligible, draw the corresponding waveform at the output. 


SOLUTION 


The waveform for y is included in Figure 11.19(b). When sel ='0', y is a copy of a, while se! ='1' causes 
it to be a copy of b. In both cases, there is no time delay. 
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FIGURE 11.19. Timing diagrams for a 2x1 multiplexer (Examples 11.9 and 11.10). 


EXAMPLE 11.10 MULTIPLEXER TIMING ANALYSIS 


Consider now that the 2 x 1 multiplexer of Figure 11.19(a) is operating near its maximum frequency, 
so the internal propagation delays must be considered. Using the simplified timing diagram style 
seen in Figure 4.8(b), draw the waveform for y with the circuit submitted to the same stimuli of 
Example 11.9. Adopt the following propagation delays through the multiplexer: 

From data (@ or b) to y: ty data=2ns (either up or down). 

From select (sel) to y: t, .e1= 1 ns (either up or down). 


SOLUTION 


The waveform for y is included in Figure 11.19(c), where the distance between the vertical lines is 
Ins. Gray shades were employed to highlight the propagation delays. When an input bit changes 
its value during a selected state, the delay is 2ns, while a bit value already available when the mux 
changes its state exhibits a delay of Ins. 


11.7 Parity Detector 


A parity detector is a circuit that detects whether the number of '1's ina binary vector is even or odd. Two 
implementation examples, employing 2-input XOR gates, are shown in Figure 11.20, both producing y ='1' 
if the parity of x is odd, that is, y=x) ®x,@...@x7. In Figure 11.20(a), the time delay is linear (the number 
of layers is N—1, where N is the number of input bits). In Figure 11.20(b), a time-wise more efficient imple- 
mentation is depicted, which employs just log,N layers, with no augment of fan-in or number of gates. 


11.8 Priority Encoder 


Priority encoders are used to manage accesses to shared resources. If two or more requests are received, 
the role of a priority encoder is to inform which of the inputs (requests) has higher priority. In the 
description that follows, it is assumed that the priority grows toward the MSB. 

Two priority encoder structures are represented in Figure 11.21. In Figure 11.21 (a), the size of y (output) 
is equal to the size of x (input), and only one bit of y is high at a time, whose position corresponds to the 
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(a) (b) 


FIGURE 11.20. Odd-parity detectors with (a) linear and (b) logarithmic time delays. 
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FIGURE 11.21. Priority encoder symbols and truth tables. 


position of the input bit with higher priority (see truth table). The structure in Figure 11.21(b) is slightly 
different; the output displays the address of the input bit of highest priority, with y="000" indicating 
the nonexistence of requests. 

We describe next two implementations for the circuit in Figure 11.21(a). To convert it into that of 
Figure 11.21(b), an address encoder (Section 11.5) can be employed. The Boolean equations for the circuit 
of Figure 11.21(a) can be easily derived from the truth table, resulting in: 


Y7=%] 
UJ 
Ye=%7 °X6 


UJ t 
Ys=X7 *Xe + X5 


! t - # ! , Uy 
Yo=%X7 Xe + X5 °° Xq * Xz + XQ +X + XO 
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FIGURE 11.22. Priority encoder implementations with (a) linear and (b) logarithmic time delays. 


Two implementations for these equations are depicted in Figure 11.22. In Figure 11.22(a), a ripple-type 
circuit is shown, which has the simplest structure but the largest time delay because the last signal (yg) 
has to propagate (ripple) through nearly N gates, where N is the number of bits (inputs). The circuit in 
Figure 11.22(b) does not require much more hardware than that in Figure 11.22(a), and it has the advan- 
tage that the number of gate delays is approximately log,N (for N=7 there are 3 layers, shown within 
dashed boxes, plus the inverter layer and the output layer). 


11.9 Binary Sorter 


A binary (or bit) sorter organizes the input bits in decreasing order, that is, all '1's then all '0's. A modular 
circuit of this type is illustrated in Figure 11.23, where each cell is composed simply of two 2-input gates 
(AND +OR). In this example, the input vector is x="01011", from which the circuit produces y="11100" 
at the output. Note that the hardware complexity is quadratic with respect to the number of bits (the 
number of cells is (V—1)N/2, where N is the number of bits). 


MM EXAMPLE 11.11 MAJORITY AND MEDIAN FUNCTIONS 
A majority function outputs a '1' whenever half or more of its input bits are high. 
a. How can the binary sorter seen above be used to compute the majority function? 
b. How many basic cells (within dashed boxes in Figure 11.23) are needed? 


c. The median of a binary vector is the central bit (assuming that N is odd) of the ordered (sorted) 
set. If N is odd, what is the relationship between computing the majority function and computing 
the median of the input vector? 
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FIGURE 11.23. Five-bit binary sorter. 


SOLUTION 


a. All that is needed is to take as output the bit of y at position (N+1)/2, starting from the top in 
Figure 11.23, if N is odd, or at position N/2, if N is even. If that bit is '1', then one-half or more of 
the input bits are high. 


b. The cells below the row mentioned above are not needed. Therefore, the total number of cells is 
3(N?-1)/8 ~3N?/8 when N is odd, or (3N-2)N/8 ~ 3N*/8 when N is even. 


c. In this case, they coincide. Ml 


11.10 = Shifters 


The main shift operations were described in Section 3.3 and are summarized below. 


# Logical shift: The binary vector is shifted to the right or to the left a certain number of positions, 
and the empty positions are filled with '0's. 


m Arithmetic shift: As mentioned in Section 3.3, there are conflicting definitions regarding arithmetic 
shift, so the VHDL definition will be adopted. It determines that the empty positions be filled with 
the original rightmost bit value when the vector is shifted to the left or with the leftmost bit value 
when shifted to the right. 


@ Circular shift (or rotation): This case is similar to logical shift except for the fact that the empty posi- 
tions are filled with the removed bits instead of '0's. 


Circuits that perform the shift operations described above are often called barrel shifters. However, 
two definitions are commonly encountered. In the more restrictive one, a barrel shifter is a circuit that 
implements only rotation, while in the more general definition it is a circuit that can implement any of 
the shift operations. 
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FIGURE 11.25. Arithmetic right shifter (logarithmic size, address decoder not needed). 


Figure 11.24 shows a circular right shifter. The circuit has a 4-bit input x=x3x,x,X9 and a 4-bit output 
Y=Y3Y2Y1Yo. The amount of shift is determined by a 2-bit signal which, after going through an N=2 
address decoder (Figure 11.8), produces the signal sh=sh3...shg shown in Figure 11.24. The relationship 
between inputs and outputs is specified in the truth table. For simplicity, single-transistor switches were 
employed in the circuit, though TGs (transmission gates, Figure 11.2) are generally preferred (they are 
faster and avoid poor 'l's). Because only one bit of sh is high at a time, only one of the columns will 
be activated, thus connecting the corresponding values of x to y. One interesting feature of this circuit 
is that, independently from the amount of rotation, each signal goes through only one switch. On the 
other hand, its size (number of columns) is linearly dependent on the maximum amount of shift desired 
(N columns for a full rotator, where N is the number of input bits). 

An arithmetic right shifter is shown in Figure 11.25. The circuit has a 6-bit input x=x5...x;X9 and a 6-bit 
output y=Ys5...Y,Yo. The amount of shift is determined by sh=shysh,shg. The switches can be PIs or TGs 
(Figure 11.2). The relationship between input and output is described in the truth table. Note that as the vec- 
tor is shifted to the right, the empty positions are filled with the MSB value as required in arithmetic right 
shifters. Differently from the shifter of Figure 11.24, the size here is logarithmic (log,N columns for a full 
rotator), and an address decoder is not needed. 

We know from Section 11.6 that each pair of switches in Figure 11.25 is indeed a multiplexer. However, 
because multiplexers can be implemented in other ways, it is sometimes preferable to simply show 
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FIGURE 11.26. Logical left shifter (logarithmic size, address decoder not needed). 


multiplexer symbols rather than switches in the schematics of shifters. This approach is illustrated in 
Figure 11.26, which shows an 8-bit logical left shifter. Notice that its size is logarithmic and that the empty 
positions are filled with '0's, as required in logical shifters. The corresponding truth table is also included 
in the figure. 


11.11  Nonoverlapping Clock Generators 


As will be shown in Section 13.5, nonoverlapping clocks are employed sometimes to prevent clock-skew 
in flip-flops. 

Two examples of nonoverlapping clock generators are depicted in Figure 11.27. The circuit in 
Figure 11.27(a) employs two NOR gates (with delay d,) plus an inverter (with delay d,). As can be seen 
in the timing diagram, the nonoverlapping time interval between @, and @, is dy on both sides, and the 
dislocation of the generated signals with respect to the clock is d, on the left (where @, is the first to 
change) and is d,+d, on the right (where @, is the first to change). 
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FIGURE 11.27. Nonoverlapping clock generators. 
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FIGURE 11.28. Single (f,) and dual (#5) pulses generated from c/k for pulse-based flip-flops (these circuits 
will be studied in Section 13.6—see Figure 13.18). 


The circuit in Figure 11.27(b) operates with AND gates (with delay d,) and inverters (again with 
delay d,). As shown in the accompanying timing diagram, the nonoverlapping time interval between 
o, and ¢, is d, on the left and 3d, on the right. Moreover, the dislocation of the generated signals with 
respect to the clock is d,+d, on the left (where ¢, is the first to change) and d, on the right (where @, is 
the first to change). 

The circuit in Figure 11.27(a) has superior signal symmetry, requires less transistors (10 against 22), 
and has more reliable outputs (the feedback loop causes signal interdependencies, so ¢, can only go up 
after ¢, has come down, and vice versa). 


11.12 Short-Pulse Generators 


Short pulses, derived from the clock, are needed for driving pulse-based flip-flops (Section 13.6). This 
type of circuit falls in one of two categories: single-pulse generator (generates a pulse only at one of 
the clock transitions—see ¢, in Figure 11.28) or dual-pulse generator (generates pulses at both clock 
transitions—see ¢, in Figure 11.28). The former is employed with single-edge flip-flops, while the latter 
is used with dual-edge flip-flops. Circuits that generate the waveforms of Figure 11.28 will be presented 
in Section 13.6, relative to pulse-based flip-flops (see Figure 13.18). 
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11.13 Schmitt Triggers 


In Chapter 14, which deals with sequential digital circuits, we will present a circuit called PLL (phase 
locked loop) that is not completely digital. The reason for including it in this book is that PLLs are 
now common units in advanced digital systems (mainly for clock multiplication and clock filtration, as 
will be seen, for example, in the study of FPGAs in Chapter 18). The same occurs with Schmitt triggers 
(STs), that is, even though they are not completely digital, their increasing presence in digital systems 
(as noise filters in the pads of modern digital ICs) makes it necessary to understand how they are con- 
structed and work. 

An ST is simply a noninverting or inverting buffer that operates with some hysteresis, generally 
represented by one of the symbols shown in Figure 11.29(a). The transfer characteristics are depicted 
in Figure 11.29(b) (for the noninverting case), showing two transition voltages, called Vzp, and Vp, 
where V rp. > Vrpy- When Vp, is low, it must grow above V 7p for the output to switch from '0' to '1'; con- 
versely, when V7, is high, it must decrease below V 7p, for the output to switch from '1' to '0'. Therefore, 
AV =Vrp.— Vr, constitutes the gate’s hysteresis, which prevents noise (up to a certain level, of course) 
from inappropriately switching the circuit. This fact is illustrated in Figure 11.29(c), where the noisy 
signal is Vj, and the clean one is Voz. Schmitt triggers are employed in the pads of modern digital ICs 
(like CPLDs, Section 18.3) to reduce the effect of noisy inputs. 

Three CMOS implementations of STs are shown in Figure 11.30, all requiring the same number of 
transistors (six). Note that the circuit in Figure 11.30(a) is inverting, while those in Figures 11.30(b)—-(c) 
are noninverting. 

The circuit in Figure 11.30(a) was one of the first CMOS STs and was employed in the old 74HC14 
integrated circuit of the popular 74-series described in Section 10.6 (which used Vpp=5V). However, 
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FIGURE 11.29. Schmitt trigger (a) symbols, (b) voltage transfer characteristics, and (c) effect on noise. 
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FIGURE 11.30. Three CMOS implementations of Schmitt trigger circuits. 
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because of its 4-transistor pile-up, it is not well suited for present day low-voltage (Vpp = 2.5 V) 
applications. Its operation is straightforward. When Vj, is low, M1-M2 are OFF and M3-M4 are ON, 
causing Voy to be high, hence turning M5 ON (though its source is floating) and M6 OFF. M5 pulls the 
node between M1-M2 to a high voltage (Vpp— Vz, where Vz is the threshold voltage of M5—Sections 
9.2-9.3). When Vj, increases, it eventually turns M1-M2 ON and M3-M4 OFF. However, because the 
source of M2 was charged to a high voltage by M5, only after that voltage is lowered by M1 (or Vy 
is high enough) M2 can be turned ON, producing Voy;=0V. Because the circuit is horizontally sym- 
metric, the reciprocal behavior occurs when Vjy is high, that is, M1-M2 are ON and M3-M4 are OFF, 
causing Voyy to be low, thus turning M6 ON (with the source floating) and M5 OFF. M6 lowers the 
voltage on the node between M3-M4 to V7 (where V+ is now the threshold voltage of M6). Therefore, 
when Vj, decreases, it can only turn M3 ON after that node’s voltage is increased by M4 (or Vj, is low 
enough). In summary, M5 and M6 cause the gate’s upward and downward transition voltages to be 
different, with the latter always smaller that the former (Figure 11.29(b)). The difference between these 
two voltages constitutes the gate’s hysteresis. 

Contrary to the ST of Figure 11.30(a), that in Figure 11.30(b) is adequate for low-voltage systems. It 
employs three regular CMOS inverters, with one of them (M3-M4) forming a feedback loop. Moreover, 
the inverter M1-M2 is stronger than M3-M4 (that is, the transistors have larger channel width-to-length 
ratios, W/L—Section 9.2). Its operation is as follows. When Vj, is low, M1 is OFF and M2 is ON, causing 
Vour to be low too, hence turning M3 OFF and M4 ON. Note that M4 reinforces the role of M2, that is, 
helps keep the feedback node high. When V7}, grows, it must lower the voltage on the feedback node for 
Vour=Vpp to eventually occur. However, for that voltage to change from high to low, it is one nMOS 
transistor (M1) against two pMOS transistors (M2, M4). Note that a reciprocal behavior occurs when Vy 
is high, that is, for the voltage on the feedback node to be turned from low to high it is one pMOS transis- 
tor (M2) against two nMOS ones (M1, M3). Such arrangement causes the gate’s upward and downward 
transition voltages to be inevitably different, with the latter again always smaller than the former. 

The ST of Figure 11.30(c) is also adequate for low-voltage applications. Its main difference from the 
previous STs is that contention between transistors no longer occurs, hence increasing the transition 
speed. Contention is avoided by allowing the circuit to go into a “floating” state before finally settling in 
a static position. The circuit contains two regular CMOS inverters (M1-M2 and M3-M4), connected to a 
push-pull stage (Mn-Mp). Note that the transition voltages of the inverters are different; in Figure 11.30(c), 
Vor: =1V and Vyp2=2V, with Vpp=3 V. Consequently, when Vi;y=0V, Vy=Vp=3V result, so only Mn 
is ON, causing Voyr=0V. Similarly, when Vj,y=3V, Vy=Vp=0V occur, thus only Mp is ON, result- 
ing in Voyr=3V. Because each inverter has a distinct transition voltage, as Viq grows from 0V to 3V 
it first deactivates Mn, thus causing the output transistors (Mn-Mp) to be both momentarily OFF (for 
1V<V,y<2V). Only when Vj, reaches 2V Mp is turned ON, thus preventing any contention between 
Mn and Mp, resulting in a fast transition. A similar analysis can be made for V;y when returning from 3 V 
to 0V. The amount of hysteresis is given by Vp o— Vyp,. (The determination of Vp for a CMOS inverter 
was seen in Section 9.5—see Equation 9.7.) 


11.14 Memories 


Viewed from a memory-read point of view, traditional memories (RAM, ROM) are combinational 
circuits because their output depends solely on their current input (that is, a memory-read access is not 
affected by previous memory-read accesses). Therefore, such memory circuits could be covered in this 
chapter. However, due to their complex and specialized construction, they are described in two separate 
chapters, one for volatile memories (Chapter 16) and the other for nonvolatile memories (Chapter 17). 
Additionally, in Chapter 20 it is also shown how VHDL can be used to infer memories. 
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1 


1.15 Exercises 


. Combinational x sequential logic 


The circuit of Figure E11.1 is similar to that in Figure 11.1(c). Write the equation for y with the switch 
in position (1), then with it in position (2). Compare the two results. Is the first circuit combinational 
and the second sequential? (Hint: If the function is combinational, the expression for y is nonrecur- 
sive, that is, y appears only on one side of the expression.) 


Feedback 
loop 


FIGURE E11.1. 


2. Compound-gate #1 


Write the equation for the function y implemented by each circuit of Figure E11.2 (note that one does 
not contain an inverter at the output). 


FIGURE E11.2. 


. Compound-gate #2 


Draw the CMOS circuit for the following functions: 
a. y=atb-ctd-e-f 
b. y=a-(b+c)-(dt+e+f) 


c. Are there any similarities between the resulting circuits? 


. Compound-gate #3 


Consider the function y=a-b-c. 
a. Draw its CMOS circuit using the SOP-based procedure described in Section 11.4, Figure 11.3(a). 


b. Draw its CMOS circuit using the POS-based procedure described in Section 11.4, Figure 11.3(b). 
Draw it employing the equivalent equation y=(a+0)-(b+0)-(c+0), and show that exactly the 
same circuit results as that in part (a) above. 
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5. Compound-gate #4 
Consider the function y=a-(b+c')-(a’+b'+c). 
a. Simplify it to a minimum SOP format then draw the corresponding CMOS circuit. 


b. Simplify it to a minimum POS format then draw the corresponding CMOS circuit. (Suggestion: 
Recall that u’-v'+u-v=(u'+v)-(u+v’).) 


6. Compound-gate #5 


Convert both CMOS circuits designed in the exercise above to pseudo-nMOS logic, then compare 
and comment on the results. 


7. Compound-gate #6 
Consider the 3-input XOR function y=a@®b@c. 
a. Manipulate this expression to obtain the corresponding irreducible SOP equation. 
b. Draw a circuit that implements the resulting equation using pseudo-nMOS logic. 
c. Repeat part (b) for dynamic unfooted logic. 

8. Address decoder symbol 


Suppose that a memory chip has capacity for 64k bits, divided into 4k words of 16 bits each. Know- 
ing that an address decoder is used to activate the word lines, do the following: 


a. Draw a symbol for this address decoder. 
b. What is the value of N in this case? 
9. Address decoder with NAND gates 
Figure E11.9 shows an N=3 address decoder and corresponding truth table. 
a. Obtain the (optimal) Boolean expression for each output (Karnaugh maps can be helpful). 


b. Implement the circuit using only NAND gates (plus inverters, if necessary). 


— Do 
— b; 
ap t— be 001 | 00000010 
1—bs 010 | 00000100 
a ss 011 | 00001000 
ae 400 | 00010000 
t— bs 101 00100000 
= 110 | 01000000 
Bes 141 | 10000000 


FIGURE E11.9. 


10. Address decoder with enable #1 


Modify your solution to the exercise above to introduce an output-enable (ena) port, as in 
Figure 11.9, which should cause the circuit to operate as a regular decoder when asserted 
(ena ='1'), but turn all outputs low when unasserted. 
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11. Address decoder with enable #2 


Similarly to the exercise above, but without modifying your solution to Exercise 11.9, introduce 
additional gates, as indicated in Figure E11.11, to allow the inclusion of an output-enable (ena) port, 
so the circuit operates as a regular decoder when ena='1', or lowers all outputs when ena='0'. 


FIGURE E11.11. 


12. Address decoder with high-impedance output 


Still regarding the address decoder of Figures E11.9 and E11.11, assume that now the enable 
port, when unasserted, must turn the outputs into a high-impedance state (see tri-state buffers in 
Section 4.8) instead of turning them low. Include the appropriate circuit for that to happen in the 
box marked with a question mark in Figure E11.11. 


13. Address decoder with pseudo-nMOS logic 


For the N=3 address decoder of Figure E11.9, after obtaining the corresponding output equations, 
draw a NOR-type implementation using pseudo-nMOS logic (as in Figure 11.8(d)). 


14. Address decoder with more inputs #1 


Construct a 5-bit address decoder using only 3-bit address decoders. 


Do 
Ao by 
ay D2 
a2 bs 
a3 Sie 
as Dao 

Day 


FIGURE E11.14. 


15. Address decoder with more inputs #2 


Construct a 5-bit address decoder using only 3-bit address decoders. 
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16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


Address-decoder functional analysis 


Redo the plots of Example 11.3 (Figure 11.11), this time considering that the circuit is operating in 
low frequency, so all internal propagation delays are negligible. 


Address-decoder timing analysis 


Redo the plots of Example 11.3 for the circuit operating in high frequency and with the following 
gate delays: f, iInv=2ns, tp anp=3ns. 


SSD decoder 


In Example 11.4, the design of an SSD decoder, using positive logic, was seen. Repeat that design, 
this time for inverted logic (that is, common-anode, in which case the segments are lit with '0' instead 
of with '1’). Start by writing the truth table, then draw Karnaugh maps and extract the optimal equa- 
tions for the segments, and finally draw the corresponding circuit. 


Address encoder 

For an N=3 address encoder, do the following: 

a. Draw a symbol for it (as in Figure 11.12). 

b. Write the corresponding truth table. 

c. Write the (optimal) Boolean expression for each output bit. 
d. Draw a circuit for it using regular gates. 

Multiplexer symbols 


Draw the symbols for the following multiplexers (make sure to show or indicate the number of pins 
in each input and output): 


a. 5x1lmux 

b. 5x8mux 

c. 8x16mux 

Multiplexer with NAND gates 

Draw a circuit for a 4x 1 multiplexer using only NAND gates (plus inverters, if necessary). 
Multiplexer with TGs 


Draw a circuit fora 4x1 multiplexer using TGs (plus inverters, if necessary). Include a buffer at the 
output. 


Multiplexer with more inputs 
Using only 4x1 multiplexers, construct the 8 x 1 multiplexer shown in Figure E11.23. 
Multiplexer with more bits per input 


Using only 4x1 multiplexers, construct the 4 x 2 multiplexer shown in Figure E11.24. 
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tn Er 7 


sel(2:0) 


FIGURE E11.23. 


Xo(1:0) 
x1(1:0) 

y(1:0) 
X2(1:0) 


X3(1:0) 


sel(1:0) 


FIGURE E11.24. 


25. Multiplexer with high-impedance output #1 


The multiplexer of Figure E11.25 is the same as that in Figure E11.23. However, an additional block 
is shown at the output, which must cause y to go into a high-impedance state when desired. Include 
that part of the circuit in your solution to Exercise 11.23. 


Xo—0 
X1—41 
x2—2 
? 
%3—3 y 
x7—47 
sel(2:0) 


FIGURE E11.25. 


26. Multiplexer with high-impedance output #2 


Similarly to the exercise above, the multiplexer of Figure E11.26 is the same as that in 
Figure E11.24. However, an additional block is shown at the output, which must cause y to go 
into a high-impedance state when desired. Include that part of the circuit in your solution to 
Exercise 11.24. 
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Xo(1:0) 
x4(1:0) , sen 
X2(1:0) 


X9(1:0) 


sel(1:0) 


FIGURE E11.26. 


27. Multiplexer functional analysis 


Figure E11.27 shows a2 x 1 multiplexer constructed with conventional gates. Supposing that the cir- 
cuit is submitted to the signals depicted in the accompanying timing diagram, draw the waveforms 
at all circuit nodes. Assume that the propagation delays are negligible (functional analysis). 


2 c 
b Le 
sel y 
sel’ 
b d 
c 
d SSR SERS ER RSS AE i 
y 


FIGURE E11.27. 


28. Multiplexer timing analysis 


Figure E11.28 shows the same 2 x 1 multiplexer seen in the previous exercise. Suppose now that it 
is operating near its maximum frequency, so the internal propagation delays must be considered. 
Say that they are ft, jjy=2ns and t, Nanp=3ns. Draw the remaining waveforms using the sim- 
plified style seen in Figure 4.8(b) and considering that the vertical lines are 1ns apart. 


‘ c 
b Le 
sel y 
sel’ 
b d 
c 
d SSeS eens Sew 
y 


FIGURE E11.28. 
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29. 


30. 


31. 


32. 


33. 


34. 


35. 


Parity detector 


Consider a 4-bit even parity detector (the circuit must produce '1' at the output when the number of 
inputs that are high is even). 


a. Write its Boolean expression without using the XOR operator (@). 
b. Draw a circuit for it using only NAND gates (plus inverters, if necessary). 
Priority encoder 


Two implementations for the priority encoder of Figure 11.21(a) were shown in Figure 11.22, both 
employing AND gates. Draw an equivalent implementation, but using only OR and/or NOR gates 
instead (inverters are allowed). Moreover, the dissimilar bit should be low instead of high. 


Binary sorter 

With respect to the binary sorter seen in Section 19.9, write the expressions below as a function of N: 
a. How many cells are needed to construct a full N-bit sorter? 

b. How many cells are needed to construct an N-bit majority function? 

c. How many cells are needed to construct an N-bit median function? 

Logical rotator 


In Figure 11.24, the implementation of a logical right rotator with linear size (N columns, where N is 
the number of inputs) was shown. And, in Figure 11.25, an arithmetic right shifter with logarithmic 
size (log,N columns) is depicted. Implement an N =4 logical right rotator with logarithmic size. 


Logical shifter 


In Figure 11.26, the implementation of a generic logical left shifter was presented. Modify it to 
became a logical right shifter. 


Nonoverlapping clock generators 

The questions below regard the nonoverlapping clock generators seen in Section 11.11. 

a. What happens to ¢, and @, in Figure 11.27(a) if the NOR gates are replaced with NAND gates? 
b. What happens to @, and ¢, in Figure 11.27(b) if the AND gates are replaced with OR gates? 
Schmitt trigger 


Examine the datasheets of modern CPLD/FPGA chips and find at least two devices that employ 
STs in their pads. What is the amount of hysteresis (normally in the 50mV-500mV range) of each 
device that you found? 


11.16 Exercises with VHDL 


See Chapter 20, Section 20.7. 


11.17 Exercises with SPICE 


See Chapter 25, Section 25.13. 
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Combinational 
Arithmetic Circuits 


Objective: The study of combinational circuits is divided into two parts. The first part, called 
combinational logic circuits, deals with logical functions and was the subject of Chapter 11. The second part, 
called combinational arithmetic circuits, deals with arithmetic functions and is the subject of this chapter. 
This type of design will be further illustrated using VHDL in Chapter 21. 


Chapter Contents 


12.1. Arithmetic versus Logic Circuits 
12.2. Basic Adders 

12.3. Fast Adders 

12.4 Bit-Serial Adder 

12.5 Signed Adders/Subtracters 

12.6 Incrementer, Decrementer, and Two’s Complementer 
12.7. Comparators 

12.8 Arithmetic-Logic Unit 

12.9 Multipliers 

12.10 Dividers 

12.11 Exercises 

12.12 Exercises with VHDL 

12.13 Exercises with SPICE 


12.1. Arithmetic versus Logic Circuits 


The study of combinational circuits is divided into two parts depending on the type of function that the 
circuit implements. 

The first part, called combinational logic circuits, was seen in the previous chapter. As the name indi- 
cates, such circuits implement logical functions, like AND, OR, XOR, multiplexers, address encoders/ 
decoders, parity detectors, barrel shifters, etc. 

The second part is called combinational arithmetic circuits and is discussed in this chapter. Again, as the 
name says, such circuits implement arithmetic functions, like adders, subtracters, multipliers, and dividers. 
A wide range of circuits will be presented along with discussions on signed systems and application 
alternatives. 
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12.2 Basic Adders 


Several adder architectures will be presented and discussed in this chapter. The discussion starts with a 
review of the fundamental single-bit unit known as full-adder, followed by the most economical multibit 
architecture (in terms of hardware), called carry-ripple adder, then proceeding to high performance struc- 
tures, like Manchester adder, carry-skip adder, and carry-lookahead adder. A bit-serial adder is also included. 
Finally, the adder-related part is concluded with a section on signed adders/subtracters. 

In summary, the following adder-related circuits will be seen: 


Full-adder unit (Section 12.2) 

Carry-ripple adder (Section 12.2) 
Manchester carry-chain adder (Section 12.3) 
Carry-skip adder (Section 12.3) 

Carry-select adder (Section 12.3) 
Carry-lookahead adder (Section 12.3) 
Bit-serial adder (Section 12.4) 

Signed adders/subtracters (Section 12.5) 


Incrementer and decrementer (Section 12.6) 


12.2.1 Full-Adder Unit 


A full-adder (FA) unit is depicted in Figure 12.1. Two equivalent symbols are shown in Figures 12.1(a)-(b), 
and the circuit’s truth table is shown in Figure 12.1(c). The variables a and b represent the input bits to 
be added, cin is the carry-in bit (to be also added to a and b), s is the sum bit, and cout is the carry-out bit. 
As shown in Figure 12.1(c), s must be high whenever the number of inputs that are high is odd (odd-parity 
function), while cout must be high when two or more inputs are high (majority function). 

From the truth table of Figure 12.1(c), the following SOP expressions are obtained for s and cout: 


s=a®b@cin (12.1) 
cout=a-b+a-cin+b-cin (12.2) 
a b a b 


° 
° 
c 
> 


cout 


Q. 
a 
> 
°Q 
°o 
c 
= 
2 
— | 
a wa 110 C'O 
sts Oo 
-oO--°Oo 
=00 -/0 = =~ o/” 
=-4-0/-c000 


(a) (b) (c) 


FIGURE 12.1. (a)—(b) Full-adder symbols and (c) truth table. 
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Implementation examples for both expressions are depicted in Figure 12.2. In Figure 12.2(a), the sum 
is computed by a conventional XOR gate and the carry-out bit is computed by conventional NAND 
gates. In Figure 12.2(b), a more efficient (transistor-level) CMOS design is shown, where the computation 
of the sum and carry-out are combined (that is, the computation of s takes advantage of computations 
that are necessary for cout, which reguires 28 transistors. In Figure 12.2(c), a mux-based full-adder is 
shown, which requires 18 transistors if the multiplexers are implemented with TGs (Figure 11.16(b)) or 
just 8 transistors (at the expense of speed) if implemented with PTs (Figure 11.16(c)). 


12.2.2 Carry-Ripple Adder 


Fundamental concepts related to addition and subtraction were studied in Sections 3.1-3.2. 
Figure 12.3(a) illustrates the operation of a multibit adder. The values in the gray area are given, while 


(a) 


FIGURE 12.2. Full-adder implementations: (a) Sum and carry-out computed by conventional gates; (b) 28T 
CMOS design, with combined computations for s and cout; (c) Mux-based full-adder, which requires 18T when 
the multiplexers are constructed as in Figure 11.16(b) or just 8T ( at the expense of speed) if implemented as 
in Figure 11.16(c). 


C4 C3 Co Ci Co 
a3 a2 a1 ao 
bs be by bo 


$3 S2 Si So 


(a) 


FIGURE 12.3. (a) In-out signals in a multibit adder; (b) Carry-ripple adder (4 bits); (c) Carry-ripple adder 
without the inverters in the FA units (Figure 12.2(b)) to reduce the delay of the critical path. 
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the others must be calculated. The inputs are a=a34,4,4) and b=b3bb,b,by, plus the carry-in bit, cy. The 
outputs are the sum and carry vectors, that is, s =$35$)5) and c=C4C3C7C, where c, is the carry-out bit. 

The simplest multibit adder is the carry-ripple adder, shown in Figure 12.3(b). For an N-bit circuit, it 
consists of N full-adder units connected in series through their carry-in/carry-out ports. Each FA adds 
three bits to produce sum and carry-out bits. Because the carry must propagate (ripple) through all 
stages serially, this is the slowest adder architecture. Roughly speaking, the time required by the FA unit 
to compute the carry bit is two gate delays (see Figure 12.2(a)). Therefore, the total delay in the carry- 
ripple adder is of the order of 2N gate delays. On the other hand, this is the most economical adder in 
terms of silicon area and, in general, also in terms of power consumption. 

Because the critical path in any adder involves the computation of the carry (because any stage needs 
information regarding all preceding stages to compute its own sum and carry-out bits), anything that can 
be removed from that path reduces the internal delay and, consequently, improves the adder’s speed. 
This situation is depicted in Figure 12.3(c), where the inverters at the sum (s) and carry-out (cout) outputs 
(see Figure 12.2(b)) of all FA units were removed (FA’ denotes an FA without these two inverters). Even 
though this implies that some inverters are now needed for a and b, and that some inverters for s must 
be reincluded (in every other unit), they are not in the critical path, so do not affect the speed. Note also 
that the final carry-out bit only needs an inverter if the number of FA’ units is odd. 


MM EXAMPLE 12.1 SIGNED/UNSIGNED ADDITION 


Suppose that a="0001", b="0110", c="1011", and cin='0'. Determine a+b, a+c, and b+c in Figure 12.3 
for the following cases: 


a. Assuming that the system is unsigned 
b. Assuming that the system is signed 


SOLUTION 


As seen in Sections 3.1-3.2, the adder operation is independent from the signs of a and b (that is, 
independent from the fact that the system is signed or unsigned). What changes from one case to 
the other is the meaning of the binary words, that is, for an unsigned 4-bit system, the range of any 
number is from 0 to 15, while for a signed 4-bit system it is from —8 to +7. In this example, the adder’s 
outputs are: 


a+b="0001" +"0110"="0111", cout ='0'; 
a+c="0001"+"1011"="1100", cout ='0'; 
b+c="0110"+"1011" ="0001", cout='I'. 
a. Interpretation of the results above when the system is 4-bit unsigned: 


a+b=1+6=7 (correct) 
a+c=1+11=12 (correct) 
b+c=6+11=1 (overflow) 


b. Interpretation of the results above when the system is 4-bit signed: 


a+b=1+6=7 (correct) 
a+c=1+(-5)=—-4 (correct) 
b+c=6+(—-5)=1 (correct) Mf 
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12.3 Fast Adders 


We start this section by describing three signals (generate, propagate, and kill) that help in understanding 
the internal structure of adders. Their usage in the implementation of higher performance adders will 
be shown subsequently. 


12.3.1 Generate, Propagate, and Kill Signals 
To help the study of adders, three signals are often defined, called generate, propagate, and kill. 


Generate (G): Must be '1' when a carry-out must be produced regardless of carry-in. For the FA unit of 
Figure 12.1, this should occur when a and Bb are both '1’, that is: 


G=a-b (12.3) 


Propagate (P): Must be '1' when a or b is '1', in which case cout =cin, that is, the circuit must allow cin 
to propagate through. Therefore: 


P=a®b (12.4) 


Kill (K): Must be '1' when a carry-out is impossible to occur, regardless of carry-in, that is, when a and 
b are both '0'. Consequently: 


K=a'-b’ (12.5) 


The use of these signals is illustrated in Figure 12.4, where a static CMOS implementation is shown 
in Figure 12.4(a), and a dynamic (footed) option is shown in Figure 12.4(b). In Figure 12.4(c), the left half 
of Figure 12.2(b) (carry part of the FA circuit) was copied to illustrate the “built-in” computations of G, 
P,and K. 

These signals can be related to those in the full-adder circuit seen in Figure 12.1, resulting in: 


s=a@®b@cin=P@ cin (12.6) 
cout=a-b+a-cin+ b- cin= a- b+(a+b)-cin=a-b+(a@b)- cin=G+ P- cin (12.7) 


Note the equality a-b+(a+b)-cin=a-b+(a@b)-cin employed in Equation 12.7, which can be easily 
verified with a Karnaugh map. 


cin cout cin’ 


pt ; 
ae 

of 
T aac cin 
Pp 


b 
4 —Generate 


(a) (b) (c) 


FIGURE 12.4. (a) Static and (b) dynamic carry circuits operating with the G, P, and K signals defined by 
Equations 12.3-12.5; (c) Built-in G, P, and K signals in the carry section of the FA unit of Figure 12.2(b). 
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The equations above can be extended to multibit circuits, like that in Figure 12.3. The resulting 


generalized expressions are (i represents the adder’s ith stage): 
Sj=PiD CG 
Ci = Git Pi C; 
Where: 
G,= a;- bj 
P= a, bj 


The computation of the carry vector for a 4-bit adder is then the following (Equation 12.9): 


G=cin 
CG = Got Po-G 
Q=G,+ P,-(Go+ Po: G) 
G= G+ P)-(G, + Py-(Go+ Po G)) 
Cy = G3+ P3-(G) + Py-(G, + P,-(Go+ Po: G))) = cout 
Developing the equations above, the following expressions result: 
C= Gy+Po-G 
Goo Poo 
G= G+ Py Got Py Py C= Gig t Pig G 
heed 
Gi.9 Pio 
C= Gy + Py G+ Py: Py» Gy + Py Py Py: Q= Gy. t Pog G 
Gyo Pro 
Cy= Gz + P3- Gy + P3+ Py» Gy + Pz: Py Py» Gy + P3- Py: Py + Py: Cy= G3.9+ P3.9° G 


G3. P3.0 


(12.8) 
(12.9) 


(12.10) 
(12.11) 


(12.12) 
(12.13) 
(12.14) 
(12.15) 
(12.16) 


(12.17) 


(12.18) 


(12.19) 


(12.20) 


where G;,; and Pj, ; are called group generate and group propagate, respectively. Notice that neither 
depends on carry-in bits, so they can be computed in advance. These signals (P, G, K, and their deriva- 


tions) will be used in the analysis of all adder architectures that follow. 


12.3.2 Approaches for Fast Adders 


The bottleneck of any adder relates to the fact that any stage needs information regarding all preceding 
bits to be able to compute its own sum and carry-out bits. The carry-ripple adder seen earlier constitutes 
the simplest architecture in which the information (carry) is passed from stage to stage, thus demanding 


a larger time to complete the computations. 
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Any architecture intended to make the circuit faster falls in one of the following two general 
categories: (i) faster carry propagation (reduction of the time required for the carry signal to propagate 
through the cell) or (ii) faster carry generation (local computation of the carry, without having to wait for 
signals produced by preceding stages). 

Both approaches are depicted in Figure 12.5, with Figure 12.5(a) indicating a faster carry propagation 
and Figure 12.5(b) indicating a faster carry generation (each stage generates its own carry-in bit). To attain 
a high performance adder, both aspects must be considered. However, some emphasize mainly a faster 
carrier transmission (e.g., Manchester carry-chain adder), while others concentrate fundamentally on 
faster carry generation (e.g., carry-lookahead adder). 


12.3.3 Manchester Carry-Chain Adder 


The Manchester carry-chain adder falls in the category of fast adders depicted in Figure 12.5(a), that 
is, it is a carry-propagate adder in which the delay through the carry cells is reduced. It can be static 
or dynamic. An example of the latter is presented in Figure 12.6(a), where the individual cells are from 
Figure 12.4(b). This circuit implements Equations 12.13-12.16, with G and P given by Equations 12.10- 
12.11. Alternatively, P;=a;+b; can be used to compute P. Thanks to these parameters (G and P), the delay 
in each cell is just one gate-delay, which is an improvement over the carry-ripple adder seen earlier. 
The critical path corresponds to Py)=P,=P,=P3=...='l'and Gy=G,=G)=G3=. . .='0', in which case all 
P-controlled transistors are serially inserted into the carry path (that is, the last carry bit is determined by 
cin). However, the parasitic capacitance of this long line limits the usefulness of this approach to about 
4 to 8 bits. (Note that cy is just a buffered version of cin.) 

Another representation for the Manchester carry-chain adder is presented in Figure 12.6(b), in which 
the critical (longest) path mentioned above can be easily observed. 


ag Do ay b; a2 De a3 bs 
(a) - = 
fast c fast cz fast C3 fast C4 
So S; S2 S3 
Ao Do Co Ao bo Co ay by Ao bo Co a; by az be 
Co 
Cc; C2 C3 
ao ay a2 a3 
(b) 
Do b; bz bs 
So S; S2 S3 


FIGURE 12.5. General approaches for the design of fast adders: (a) Faster carry propagation (reduction of 
time needed for the carry to propagate through the cells) and (b) faster carry generation (each stage com- 
putes its own carry-in bit). 
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(b) 


FIGURE 12.6. (a) Dynamic Manchester carry-chain adder (the cells are from Figure 12.4(b)), where G;=a;- 6, and 
P,=a,@®b; or P,=a,+6;; (b) Another representation for the same circuit highlighting the critical (longest) path. 


12.3.4 Carry-Skip Adder 


Suppose that we want to construct a 4-bit (adder) block, which will be associated to other similar blocks 
to attain a larger adder. We mentioned in the description of the Manchester carry-chain adder above 
that its critical path corresponds to Py=P,=P,=P3='1' because then the last carry bit (cout=c,4) is deter- 
mined by the first carry bit (cy=cin), meaning that such a signal must traverse the complete carry circuit. 
Because this only occurs when c,=cg, a simple modification can be introduced into the original circuit, 
which consists of bypassing the carry circuit when such a situation occurs. This modification is depicted 
in Figure 12.7, which is called carry-skip or carry-bypass adder. For simplicity, only the carry circuit is 
shown in each block, with the GP section (circuit that generates the G and P signals) and the sum section 
omitted. A multiplexer was included at the output of the carry circuit, which selects either the carry 
generated by the circuit (when P3.)='0') or directly selects the input carry (when P3.,.='1'), thus eliminat- 
ing the critical path described above. 

By looking at a single block, the overall reduction of propagation delay is of little significance because 
now a multiplexer was introduced into the new critical path. However, when we look at the complete 
system (with several blocks, as in Figure 12.7), the actual benefit becomes apparent. Suppose that our 
system contains 4 blocks of 4 bits each and that the last carry bit has been generated by the second 
stage of the first block (which is now the critical path). Even though the reduction in the propagation 
delay in the first block might be negligible (because one stage was traded for one multiplexer), the 
bypass structure will cause this carry bit to skip all the other blocks. In other words, the total carry 
delay in an n-block system cannot be larger than that of one block plus n multiplexers (note that this is 
the critical carry delay; the total delay is this plus that needed by the last sum section to compute the 
sum vector). 
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12.3.5 Carry-Select Adder 


Another fast adder architecture is depicted in Figure 12.8. In a traditional adder (like Manchester 
carry-chain) each block consists of three sections: GP section (where G and P are generated), carry circuit 
(which generates the carry bits), and the sum section (where the output signals—sum bits—are computed). 
The difference between this traditional architecture and that of Figure 12.8 is that the latter contains 
two carry sections per block instead of one, and it also contains a multiplexer. One of the carry circuits 
operates as if the carry-in bit were '0', while the other operates as if it were '1', so the computations can 
be performed in advance. The actual carry bit, when ready, is used at the multiplexer to select the right 
carry vector, which is fed to the sum section where the output vector is finally computed. The carry-out 
bit from one block is used to operate the multiplexer of the subsequent block. Note that in the first block 
the use of two carry sections is actually not needed because cy (the global carry-in bit) is already avail- 
able at the beginning of the computations. The main drawback of the carry-select architecture is the extra 
hardware (and power consumption) needed for the additional sections. 


12.3.6 Carry-Lookahead Adder 


In the carry-skip circuit (Figure 12.7), each block contained a carry-propagate type of adder (e.g., carry-ripple 
adder, Figure 12.3, or Manchester carry-chain adder, Figure 12.6). Therefore, the internal block operation 
requires the carry to propagate through the circuit before the sum can actually be computed. The fundamen- 
tal difference between that approach and the carry-lookahead approach is that in the latter each stage com- 
putes its own carry-in bit without waiting for information from preceding stages. This approach, therefore, 
corresponds to that depicted in Figure 12.5(b). 


Go Po G, P, G2 P2 G; P3 G, P, Gs Ps Gs P, G, P, 


ie cary in chain ame = cary ain | chain a 


0 = PsP2P;Po Po.4= PsP5PsP; 


Co 


FIGURE 12.7. Carry-skip (also called carry-bypass) adder. 


FIGURE 12.8. Carry-select adder. 
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Mathematically speaking, this means that the circuit implements Equations 12.17—12.20 instead of 
Equations 12.13-12.16, that is, each stage computes its own group generate and group propagate signals, 
which are independent from carry (so can be computed in advance). When these expressions (which 
produce the carry vector) are implemented, Equation 12.8 can be employed to obtain the sum vector. 
This architecture is depicted in Figure 12.9, where two 4-bit blocks are shown. 

In the first block of Figure 12.9, the first stage computes the pair (Gp., Po.9) in the GP layer, from which 
the carry bit c, is obtained in the carry layer, eventually producing s, in the sum layer. A similar compu- 
tation occurs in the succeeding stages, up to the fourth stage, where the pair (G3.9, P3.9) is computed in 
the GP layer, producing c, in the carry layer, which propagates to the next block. The main drawbacks of 
this approach are the large amount of hardware needed to compute G;,., and also its high fan-in (which 
reduces its speed), hence limiting its usefulness to blocks of about 4 bits. These limitations will be illus- 
trated in the example that follows. 


Carry layer Sum layer 


Co 
GP layer 
P 2 
ny) 
ao | Go: .) Cy 
Be s 
tol pee LU [pd 
a1:0 =>} Gi:0 Sint C2 
x 
8 / Pio =>} P:0 | ) D P2 “3 
i 
= 32:0 => G2:0 line C3 
Ss 
b2:0 => P20 |) P3 : 
a3:0 =>} G3:0 Cc) ») C4 
Da:0 =P} P3:0 Rr 
C4 
Carry layer Sum layer 
GP layer 
as ae G44 
by Py. 4 
Sa ye Gs 4 
2 < Ds:4 —>| Ps.4 
2 
y a6:4 =P} Gers 
Derg =P} Poza 
27:4 Gr 
Dr:4 =P] P; 4 
Ce 


FIGURE 12.9. Carry-lookahead adder. 
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MM EXAMPLE 12.2 FOUR-BIT CARRY-LOOKAHEAD CIRCUIT 


a. Based on the discussion above, draw a circuit for a 4-bit carry-lookahead adder. Even though 
optimal implementations are developed at transistor-level rather than at gate-level (see, for 
example, Figure 12.2), use conventional gates to represent the circuit. 


b. Explain why the carry-lookahead approach is limited to blocks of about 4 bits (check the size of 
the hardware and also the fan-in of the resulting circuit; recall that the time delay grows with 
fan-in, so the latter is an important parameter for high-speed circuits). 


c. Compare its time dependency with that of a carry-ripple adder. 


SOLUTION 


Part (a): 
The circuit is shown in Figure 12.10. 


FIGURE 12.10. Complete 4-bit carry-lookahead adder of Example 12.2. 
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Part (b): 

The limitations of this approach can be easily observed in Figure 12.10. First, the hardware grows 
nearly exponentially with the number of bits (compare the size of the carry cells from c, to c,), which 
increases the cost and the power consumption. Second, for a gate not to be too slow, its fan-in (num- 
ber of inputs) is normally limited to about four or five. However, in the computation of c,, two gates 
have already reached the limit of five. 


Part (c): 

A time-dependency comparison between carry-lookahead and carry-ripple adders is presented in 
Figure 12.11. In the latter, the fan-in of each carry cell is fixed (two in the first gate, three in the sec- 
ond—see the circuit for cout in Figure 12.2(a)), but the number of layers (cells) grows with N (number 
of bits); this situation is depicted in Figure 12.11(a). On the other hand, the number of layers (cells) is 
fixed in the carry-lookahead adder (see in Figure 12.10 that the computation of any carry bit requires 
only one cell, composed of three layers of gates, one for GP and the other two for the carry bits), but 
the fan-in grows with N, hence affecting the adder’s speed. This case is illustrated in Figure 12.11(b), 
which shows a fixed number of gate layers (three) in all carry cells but with an increasing fan-in 
(maximum of two in the first stage and five in the fourth). 


DDDr« 
. (Ds 
DDD}= 
DDD 


FIGURE 12.11. Illustration of gate-delay versus fan-in for (a) a carry-ripple adder and (b) a carry-lookahead 
adder. The fan-in is fixed in the former, but the number of layers grows with N, while the opposite happens 
in the latter. O 


12.4  Bit-Serial Adder 


Contrary to all adders described above, which are parallel and truly combinational (they require no stor- 
age cells), the adder described in this section is serial and requires memory. Because its next computation 
depends on previous computations, it is a sequential circuit. Nevertheless, it is included in this chapter to 
be shown along with the other adder architectures. 

The bit-serial adder is depicted in Figure 12.12. It consists of a FA unit plus a D-type flip-flop (DFF, 
studied in chapter 13). The inputs are the vectors a and b, whose bits are applied to the circuit serially, 
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FIGURE 12.12. Serial adder (a sequential circuit). 


that is, a) and bp, then a, and by, etc., from which the sum bits (sp, then s,, etc.) and the corresponding 
carry-out bits are obtained. Note that the carry-out bit is stored by the DFF and is then used as the carry- 
in bit in the next computation. 


12.5 Signed Adders/Subtracters 


Signed addition/subtraction was described in Section 3.2. Conclusions from that section are now used 
to physically implement signed adders/subtracters. 


12.5.1 Signed versus Unsigned Adders 


Suppose that a and b are two N-bit numbers belonging to a signed arithmetic system. If so, any positive 
value is represented in the same way as an unsigned value, while any negative number is represented 
in two’s complement form. To add a and b, any regular adder can be used (like those seen above) regard- 
less of a and/or b being positive or negative because when a number is negative it is already represented 
in two’s complement form, so straight addition must be performed. For example, consider that N=4, 
a=5 ("0101"), and b=—7 ("1001"); then a+b =(5) + (-7) ="0101" +"1001" ="1110" =—2. On the other hand, to 
subtract b from a, b must first undergo two’s complement transformation and then be added to a, regard- 
less of b being positive or negative. As an example, consider N=4, a=5 ("0101"), and b=—2 ("1110"); then 
a—b=(5)—(—2)=(5)+(2)="0101" +"0010" ="0111" = 12. 

Suppose now that a and b belong to an unsigned system. The only difference is that both are neces- 
sarily nonnegative, so the overflow checks are slightly different, as described in Section 3.1. The rest is 
exactly the same, that is, to add a and b, any adder is fine, while to subtract b from a, b must first be two’s 
complemented, then added to a. 

In conclusion, to perform addition and subtraction of signed and unsigned numbers all that is needed 
is a conventional adder (like any of those seen above) plus two’s complement circuitry (plus distinct 
overflow checks, of course). The insertion of two’s complement is explained below. 


12.5.2 Subtracters 


As described above, a subtracter can be obtained by simply combining a two’s complementer with an 
adder. A circuit of this type is depicted in Figure 12.13(a), which allows two types of operations, defined 
by a+b. When the operator (op) is '0', the outputs of the XOR gates are equal to the inputs, and cin='0', 
so regular addition occurs. On the other hand, when op='1', the XOR gates complement the inputs, and 
because now cin='1',a+b'+1=a—b results. A possible implementation for the XOR gate, based on trans- 
mission gates (TGs, Figure 11.2), is shown in the inset on the right of Figure 12.13(a). Two alternatives for 
overflow check are also included (one based on carry and the other based on operands, as described in 
Section 3.2) where the system was considered to be signed. 
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FIGURE 12.13. Adder/subtracter circuits that compute (a) a+b and (b)+a+b (the original signs of a and b 
can be either). 


Another circuit of this type is shown in Figure 12.13(b), which is a completely generic adder/subtracter 
(it can perform all four operations defined by a+b with carry-in). Atwo’s complementer is now included 
in the system with all inputs going through it and also going directly to the adder. A series of multiplex- 
ers, controlled by the operators op_a and op_b, are then used to select which of these signals should even- 
tually enter the adder (addition is selected when the operator is low, and subtraction is selected when it 
is high). The overflow check is similar to that in the previous circuit. A possible implementation for the 
multiplexers (Section 11.6) is included in the inset on the right of Figure 12.13(b). 


MM EXAMPLE 12.3 SIGNED/UNSIGNED ADDITION/SUBTRACTION 


Suppose that a="1001", b="0101", and cin='0'. Determine the outputs produced by the circuit of 
Figure 12.13(b) for: 


op_a='0' and op_b='0'; 
op_a='0' and op_b='1'; 
op_a='1' and op_b='0'; 


a. Assume that it is a 4-bit unsigned system. 


b. Assume that it is a 4-bit signed system. 
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SOLUTION 


As in Example 12.1, the adder operation is independent from the system type (signed or unsigned); 
the only difference is in the way the numbers are interpreted. The results produced by the circuit of 
Figure 12.13(b) are the following: 


With op_a='0' and op_b='0': (a) + (b)="1001" +"0101" ="1110"; 

With op_a='0' and op_b='1': (a) + (—b) ="1001" +"1011" ="0100"; 

With op_a='1' and op_b='0': (—a) + (b) ="0111"+"0101"="1100"; 

a. Interpretation of the results above when the system is unsigned (so a=9 and b=5): 
Expected: (a) + (b) = (9) +(5)=14; Obtained: "1110" (correct). 

Expected: (a) + (—b) =(9) + (—5) =4; Obtained: "0100" (correct). 

Expected: (—a) + (b) =(—9) + (5) =—4; Obtained: "1100" (=12— overflow). 

b. Interpretation of the results above when the system is signed (so a=—7 and b=5): 
Expected: (a) + (b) = (—7) + (5) =—2; Obtained: "1110" (correct); 

Expected: (a) + (—b) =(—7) + (—5) =-12; Obtained: "0100" (=4— overflow); 

Expected: (—a) + (b) =(7) + (5) =12; Obtained: "1100" (=-4-— overflow). Hl 


12.6  Incrementer, Decrementer, and Two's 
Complementer 


12.6.1 Incrementer 


To increment a number (that is, add 1), the circuit of Figure 12.14(a) could be used, which is just a regular 
adder (carry-ripple type in this case, Section 12.2) to which cin='l' and b="0...00" were applied, thus 
resulting in s=a+1 at the output. This, however, is a very inefficient implementation because too much 
(unnecessary) hardware is employed in it. It is important to observe that when a number is incremented 
by one unit the only thing that happens to it is that all bits up to and including its first (least significant) '0' 
are inverted. For example, "00010111" + 1="00011000". Therefore, a circuit like that in Figure 12.14(b) can 
be used, which computes b=a+1 and is much more compact than the adder approach of Figure 12.14(a). 


12.6.2 Decrementer 


A similar behavior occurs in a decrementer. To subtract one unit from a number is the same as adding 
"1...11" to it. Therefore, all that needs to be done is to invert all bits up to and including its first (least sig- 
nificant) '1'. For example, "00111100"—1="00111011". This can be achieved with the circuit of Figure 12.14(c), 
which computes b=a-—1 and is also much more compact than the adder-based approach. 


12.6.3 Two’s Complementer 


Another circuit that falls in this category is the two’s complementer (needed to represent negative 
numbers). Its operation consists of inverting all input bits then adding 1 to the result, that is, b=a' +1. 
However, because a’ =2' —1-a, the problem can be rewritten as b=a'+1=2N-1-a+1=2N-1-(a-1)= 
2N 1 —Agecrem =4decrem’» Consequently, all that is needed is to invert the outputs of the decrementer, 
thus resulting in the circuit shown in Figure 12.14(d). 
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FIGURE 12.14. (a) Area-inefficient adder-based incrementer implementation; More efficient (b) incrementer, 
(c) decrementer, and (d) two’s complementer circuits. 


12.7 Comparators 


An equality comparator is illustrated in Figure 12.15(a). The circuit compares two vectors, a and b, bit by 
bit, using XOR gates. Only when all bits are equal is the output '1’. 

Another comparator, which is based on an adder, is shown in Figure 12.15(b). It compares two 
unsigned (nonnegative) numbers, a and b, by performing the operation a—b=a+b' +1. If the carry-out 
bit of the last adder is '1', then a=b. Moreover, if all sums are '0', then a and b are equal. 

Finally, another unsigned magnitude comparator is shown in Figure 12.15(c), which, contrary to the 
circuit in Figure 12.15(b), is not adder-based (it employs multiplexers instead, which are controlled by 
XOR gates). By changing a reference bit from '0' to 'l' the circuit can compute a>b as well as a=b, 
respectively. 


MM EXAMPLE 12.4 SIGNED COMPARATOR 


Modify the traditional unsigned comparator of Figure 12.15(b) so it can process signed numbers. 


SOLUTION 


We saw in Section 3.2 that when adding two N-bit signed numbers to produce an (N+1)-bit out- 
put the last carry bit must be inverted if the signs of the inputs are different. Hence, for the circuit 
of Figure 12.15(b) to be able to process signed numbers, c, must be inverted when a, and b,' are 
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FIGURE 12.15. (a) Equality comparator; (b) Unsigned adder-based magnitude and equality comparator; 
(c) Unsigned mux-based magnitude comparator. 
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FIGURE 12.16. Signed comparator of Example 12.4. 


different because they represent the signs of the values that actually enter the adder. The corresponding 
truth table is shown in Figure 12.16(a), which shows the original and the corrected values of c4. 
Note in the truth table that the upper part of the latter is equal to the lower part of the former and 
vice versa. Therefore, if we simply invert c3, the right values are automatically produced for c,. The 
resulting circuit is depicted in Figure 12.16(b). Because c, now represents the sign of a—b, cy='0' 
signifies that a=b, while c,='1' implies that a<b (hence the latter appears in the figure). Finally, 
observe that the inversion of c3, causes s3 to be inverted as well, so another inverter is needed to 
correct s3, as shown in Figure 12.16(b). Ml 
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12.8  Arithmetic-Logic Unit 


A typical arithmetic-logic unit (ALU) symbol is shown in Figure 12.17. It contains two main inputs (a, b) 
plus an operation-select port (opcode) and a main output (y). The circuit performs general logical (AND, 
OR, etc.) and arithmetic (addition, subtraction, etc.) operations, selected by opcode. To construct it, a com- 
bination of circuits is employed, particularly logic gates, adders, decoders, and multiplexers. 

An example of ALU specifications is included in Figure 12.17. In this case, opcode contains 4 bits, so 
up to 16 operations can be selected. In the upper part of the table, eight logical operations are listed 
involving inputs a and D, either individually or collectively. Likewise, in the lower part of the table, eight 
arithmetic operations are listed, involving again a and b, either individually or collectively; in one of 
them, cin is also included. Note that the MSB of opcode is responsible for selecting whether the logic (for 
'0') or arithmetic (for '1') result should be sent out. 

The construction of an ALU is depicted in Figure 12.18, which implements the functionalities listed in 
Figure 12.17. In Figure 12.18(a), a conceptual circuit is shown, which serves as a reference for the actual 
design. It contains one multiplexer in each section (logic at the top, arithmetic at the bottom), connected 
to an output multiplexer that selects one of them to be actually connected to y. 

Circuit details are shown in Figure 12.18(b). Thick lines were employed to emphasize the fact that 
ALUs are normally multibit circuits. The upper (logic) section is a straight implementation of the equa- 
tions listed in the specifications of Figure 12.17 with all gates preceding the multiplexer. The arithmetic 
section, however, was designed differently. Due to the adder’s large circuit (seen in Sections 12.2-12.3), 
it is important not to repeat it, so the multiplexers were placed before the adder. This, on the other hand, 
causes the instruction decoder, which takes opcode and converts it into commands for the switches (mul- 
tiplexers), to be more complex. 

The figure also shows that a and b can be connected to the adder directly or through a two’s 
complementer (for subtraction, as discussed in Section 12.5). Note that the multiplexers also have an 
input equal to zero (an actual adder must not operate with inputs floating) and that multiplexer D is 
a single-bit mux. 

Finally, the specifications for the instruction decoder are listed in Figure 12.18(c), showing which 
switches (multiplexer sections) should be closed in each case. This circuit can be designed using the 
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Add a and b 
Sub b from a 
Sub a from b 
Add negative 
Add with 1 
Add with carry 


<<“ 


cin 


<< 
oy 


1001 
1010 
1011 
1100 
1101 
1110 
1111 


opcode 
Arithmetic 


y = atb+tcin 


FIGURE 12.17. ALU symbol and ALU specifications example. 
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FIGURE 12.18. Circuit for the ALU specified in Figure 12.17: (a) Conceptual circuit; (b) Construction example; 
(c) Specifications for the instruction decoder. 


procedure presented in Section 11.5 (as in Example 11.4). Note that the MSB of opcode can be connected 
directly to multiplexer E. 


12.9 Multipliers 


Binary multiplication was discussed in Sections 3.4-3.5. The traditional multiplication algorithm of 
Figure 3.8(a) was repeated in Figure 12.19. The inputs are a=a34,4,4) (multiplier) and b=b3b,b,b, (mul- 
tiplicand), while the output is p=p7...p,p9 (product). For N-bit inputs, the output must be 2N bits wide 
to prevent overflow. Because the multiplication algorithm involves logical multiplication plus arithmetic 
addition, AND gates plus full-adder units, respectively, can be used to implement it. 
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12.9.1 Parallel Unsigned Multiplier 


A parallel-input parallel-output circuit (also known as array multiplier) that performs the operations 
depicted in Figure 12.19 is shown in Figure 12.20. The circuit is combinational because its output depends 
only on its current inputs. As expected, it employs an array of AND gates plus full-adder units. Indeed, 
Po=40b9, Py =49b1 +44bo, P2=Agby+44b,+aybyt+carry(p,), etc. This circuit operates only with positive 
(unsigned) inputs, also producing a positive (unsigned) output. 


FIGURE 12.19. Traditional unsigned multiplication algorithm. 


l= 
‘iee 


FIGURE 12.20. Parallel-input parallel-output unsigned multiplier. 
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12.9.2 Parallel Signed Multiplier 


If the system is signed (that is, contains positive and negative numbers, with the latter expressed in two’s 
complement form), then the circuit of Figure 12.20 requires some modifications. The simplest way is by 
converting any negative number into a positive value (by going through a two’s complementer) and 
remembering the sign of the inputs. If the signs of a and b are different, then the multiplier’s output 
must also undergo two’s complement transformation to attain a negative result. For example, to mul- 
tiply (3) x (-6), the following must happen: ("0011") x ("1010" > "0110") = ("00010010" > "11101110" =—18), 
where the arrows indicate two’s complement transformations. 

Another way of obtaining a signed multiplier was depicted in Figure 3.10 of Section 3.5, which 
was repeated in Figure 12.21 below. Comparing Figures 12.21 and 12.19, we verify that several most- 
significant partial products were inverted in the former and also that two '1's were included (shown 
within gray areas) along the partial products. These little changes can be easily incorporated into the 
hardware of Figure 12.20 to attain a programmable signed /unsigned circuit. 


12.9.3 Parallel-Serial Unsigned Multiplier 


Figure 12.22 shows a mixed parallel-serial multiplier. One of the input vectors (a) is applied serially, 
while the other (b) is connected to the circuit in parallel. The output (p) is also serial. Although this is not 
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FIGURE 12.21. Signed multiplication algorithm (for positive and negative inputs). 
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FIGURE 12.22. Mixed parallel-serial unsigned multiplier (a sequential circuit). 
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a purely combinational circuit (it is a clocked circuit whose output does depend on previous inputs), it 
is shown here to provide an integrated coverage of multipliers. Like the serial adder of Figure 12.12, the 
circuit employs D-type flip-flops (DFFs, studied in Chapter 13) in the feedback loops to store the carry 
bits. DFFs are also used to store of the sum bits. 


MM EXAMPLE 12.5 PARALLEL-SERIAL MULTIPLIER FUNCTIONAL ANALYSIS 


Figure 12.23 shows a 3-bit parallel-serial multiplier. Assuming that the system has been reset upon 
initialization, show that the result produced by this circuit is p="011110" when submitted to the 
inputs shown in the figure. 


FIGURE 12.23. 3-bit parallel-serial multiplier of Example12.5. 


SOLUTION 


Initially, 9,=43=4.=4,;=4)='0'. After aj='1' is presented, the AND gates produce x x,xj="110", 
thus resulting in dy=x,='1', d3=carry(x1, qa, 43) ='0', dy =(X1 +944 43) ='1', dy =carry(Xo, Jo, 91) ='0", 
and dy=(X%9+92+49,)='0'. These values of d are transferred to q at the next positive clock edge, 
resulting in 949342919) = "10100". Next, a, ='0' is presented, so the AND gates produce xxx) ="000", 
thus resulting in d,='0', d,='0', d>='1', d,='0', dy='1', which are transferred to q at the next rising 
clock edge, that is, 94943429;4)="00101". The last bit, a,='1', is presented next, so the AND gates 
produce xxx j="110", resulting in d,='1', d;='0', dy='1', d,='0', dy='1', which are again stored by 
the flip-flops at the next rising clock edge, that is, q49344,9)9="10101". Now three zeros must be 
entered in the serial input (a) to complete the multiplication. In these three clock cycles the fol- 
lowing values are produced for 4443924149: "00101", "00001", and "00000". Because qy is the actual 
output, its sequence of values represents the product. From the values of gg above, we conclude 
that p="011110" (observe that the first value of gg to come out represents py, not ps; in summary, 
"101"(=5) x "110" (=6) ="011110" (=30)). 


EXAMPLE 12.6 PARALLEL-SERIAL MULTIPLIER TIMING ANALYSIS 


Assume now that the adder of Figure 12.23 is operating near its maximum frequency, so internal 
propagation delays must be taken into consideration. Using the simplified timing diagram style 
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seen in Figure 4.8(b), draw the waveforms at all adder’s nodes. Adopt a clock period of 10ns and the 
following propagation delays: 


Through the AND gates: f,, = 1ns. 
Through the full-adder units: t cary 
Through the flip-flops: t,cg= Ins. 


SOLUTION 


The corresponding timing diagram is shown in Figure 12.24. Gray shades were employed to high- 
light the propagation delays (1ns and 2ns). 


=1ns for the carry and f,, .4m=2ns for the sum. 


ok LaLa 


product > ¥* ‘o' —_¥ 14° ——_e—__ 4! v 4 a, 4) J ‘0’ J 


LSB MSB 
FIGURE 12.24. Timing diagram for the 3-bit parallel-serial multiplier of Figure 12.23. O 


12.9.4 ALU-Based Unsigned and Signed Multipliers 


Most dedicated hardware multipliers implemented from scratch (at transistor- or gate-level) are straight 
implementations of the basic algorithm depicted in Figure 12.19 (for example, see the circuits of Figures 12.20 
and 12.22). However, as seen in Sections 3.4-3.5, multiplication (and division) can also be performed using 
only addition plus shift operations. This approach is appropriate, for example, when using a computer to do 
the multiplications because its ALU (Section 12.8) can easily do the additions, while the control unit can easily 
cause the data registers to be shifted as needed. To distinguish this kind of approach from those at the transis- 
tor or gate level, we refer to the former as ALU-based algorithms. 

Two such multiplication algorithms were described in Sections 3.43.5, one for unsigned numbers 
(Figure 3.9) and the other for signed systems (Booth’s algorithm, Figure 3.12). In either case, the general 
architecture is that depicted in Figure 12.25, with the ALU shown in the center. The product register 
has twice the size of the multiplicand register, with no register needed for the multiplier because it is 
loaded directly into the right half of the product register upon initialization (see Figures 3.9 and 3.12). 
Depending on the test result, the ALU adds the multiplicand (it might also subtract it in the case of 


312 CHAPTER 12 Combinational Arithmetic Circuits 
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FIGURE 12.25. ALU-based multiplier (the algorithms are described in Figures 3.9 and 3.12 for unsigned and 
signed systems, respectively). 


Booth’s algorithm) to the left half of the product register and writes the result back into it. The control 
unit is responsible for the tests and for shifting the product register as needed, as well as for defining the 
operation to be performed by the ALU (add/subtract). 


12.10 Dividers 


Binary division was studied in Sections 3.6-3.7. Due to the complexity of division circuits, ALU-based 
approaches are generally adopted. For unsigned systems, an ALU-based division algorithm was 
described in Figure 3.14, which utilizes only addition/subtraction plus logical shift. This algorithm can 
be implemented with the circuit of Figure 12.25, where the product register becomes the quotient plus 
remainder register. At the end of the computation, the quotient appears in the right half of that register, 
and the remainder appears in its left half. 

For signed systems, the common approach is to convert negative numbers into positive values, 
then perform the division as if the numbers where unsigned. However, if the dividend and the divi- 
sor have different signs, then the quotient must be negated (two’s complement), while the remainder 
must always carry the same sign as the dividend (in other words, if the dividend is negative, then 
the remainder must also undergo two’s complement transformation to convert it into a negative 
number). 


12.11 Exercises 


. Full-adder operation 


Check whether the CMOS full-adder circuit of Figure 12.2(b) actually complies with the truth table 
of Figure 12.1(c). 


2. Carry-ripple adder #1 
a. Briefly explain how the carry-ripple adder of Figure 12.3(b) works. 
b. Why is it more compact than other multibit adders? 


c. Why is it normally slower than other multibit adders? 
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3. Carry-ripple adder #2 


Figure E12.3 shows a 4-bit carry-ripple adder with a="0101", b="1101", and cin='1' applied to its 
inputs. Based on the FA’s truth table (Figure 12.1), write down the values produced for the sum and 
carry bits. Check whether the result matches the expected result (that is, 5+13+1=19). 


FIGURE E12.3. 


4. Carry-ripple adder #3 
Repeat the exercise above but now using the circuit of Figure 12.3(c). 
5. Carry-ripple-adder timing diagram 


Figure E12.5 shows a 3-bit carry-ripple adder with a="011", b="100", and cin='0' applied to its 
inputs. Assuming that the propagation delay through the full-adder unit is ¢, ¢q,-y=3ns for the carry 
and t, <..=4ns for the sum, complete the timing diagram of Figure E12.5, where the carry-in bit 
changes from '0' to '1'. Assume that the vertical lines are 1 ns apart. Does the carry-out propagation 
delay accumulate? 


FIGURE E12.5. 
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6. Fast adder 


Figure E12.6 shows one of the general approaches for the construction of fast adders. 


a. Briefly compare it to the carry-ripple adder (Figure 12.3). Why is it generally faster? Why does 
it require more silicon area to be implemented? Why is its usage limited to only a few bits? 


b. Suppose that a="0101", b="1101", and cin='1'. Write the values that the circuit must produce at 
each node (sum and carry bits), then check whether the sum matches the expected result (that 


is, 5+13+1=19). 


Do 
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FIGURE E12.6. 


7. Manchester carry-chain adder 


a. Explain why the Manchester carry-chain circuit of Figure 12.6(a) falls in the general architecture 


of Figure 12.5(a). 


b. Verify its operation by applying a="0101", b="1101", and cin='1' to its inputs and calculating the 
corresponding carry-out bits. Do the results match the expected values? 


8. Carry-lookahead adder 


a. Explain why the carry-lookahead circuit of Figure 12.10 falls in the general architecture of 


Figure 12.5(b). 


b. Verify its operation by applying a="0101", b="1101", and cin='1' to its inputs and calculating 
the corresponding sum and carry-out bits. Check whether the sums matches the expected value 
(that is, 5+134+1=19). 


9. Serial-adder timing diagram #1 


Figure E12.9 shows a serial adder to which a="0110" (=6) and b="0011" (=3) are applied. These 
signals are included in the accompanying timing diagram. Assuming that it is operating in low 
frequency (so the internal propagation delays are negligible), complete the timing diagram by draw- 
ing the waveforms for cin, s, and cout (adopt the style seen in figure 4.8(a)). Does the result match 
the expected value (that is, 6+3=9)? 
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FIGURE E12.9. 


10. 


Serial-adder timing diagram #2 


Figure E12.10 shows again a serial adder to which a="0110" (=6) and b="0011" (=3) are applied (these 
signals are shown in the accompanying timing diagram). Assuming that now the circuit is operating 
near its maximum frequency, so internal propagation delays must be taken into consideration, 
complete the timing diagram. Consider that the propagation delay through the adder is t, cany=2ns 
for the carry and ¢,, .,.,=3ns for the sum, and the propagation delay from clk to q in the DFF is 
tycg=2ns. Adopt the simplified timing diagram style seen in Figure 4.8(b), and assume that the clock 


period is 10ns. Does the result match the expected value (6+3=9)? 


FIGURE E12.10. 


11. 


12. 


13. 


Serial-adder timing diagram #3 


Repeat Exercise 12.9 for a="1100" (=12) and b="0110" (=6). Does the result match your expectation 
(that is, 12+6=18)? 


Serial-adder timing diagram #4 


Repeat Exercise 12.10 for a="1100" (=12) and b="0110" (=6). Does the result match your expectation 
(that is, 12+6=18)? 


Incrementer 
a. Write the Boolean expressions (in SOP format) for the incrementer seen in Figure 12.14(b). 


b. Suppose that a="10110" (=22). Apply it to the circuit and check whether it produces the expected 
result (that is, b=23). 


c. Repeat part (b) above for a="11111" (=31). What is the expected result in this case? 
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14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


Decrementer 
a. Write the Boolean expressions (in SOP format) for the decrementer seen in Figure 12.14(c). 


b. Suppose that a="10110" (=22). Apply it to the circuit and check whether it produces the expected 
result (that is, b=21). 


c. Repeat part (b) above for a="00000" (=0). What is the expected result in this case? 
Two’s complementer 
a. Write the Boolean expressions (in SOP format) for the two’s complementer seen in Figure 12.14(d). 


b. Suppose that a="01101" (=13). Apply it to the circuit and check whether it produces the expected 
result (that is, b="10011"=-—13). 


c. Repeat part (b) above for a="11111" (=—1). What is the expected result in this case? 
Comparator equation and design 

Consider the 3-bit equality comparator of Figure 12.15(a). 

a. Write its Boolean expression without using the XOR operator (@). 

b. Draw a circuit for it using only NAND gates (plus inverters, if necessary). 
Unsigned comparator operation #1 


Verify the operation of the unsigned comparator of Figure 12.15(b) by applying the values below 
to its inputs and checking the corresponding results. What are the decimal values corresponding to 
a and b? 


a. a="1010", b="1101" 
b. a="1010", b="0111" 
c. a="1010", b="1010" 
Unsigned comparator operation #2 


Repeat the exercise above, this time for the comparator of Figure 12.15(c). Check both control options 
(a>banda=b). 


Complete unsigned comparator 


The unsigned comparator of Figure 12.15(b) has two outputs, one that is '1' when a= b and another 
that is '1' when a=b. Draw the gates needed for the circuit to also compute a Sb. 


Signed comparator operation 


Repeat Exercise 12.17 for the signed comparator of Figure 12.16(b). Note that some of the decimal 
values corresponding to a and b are now different (it is a signed system). 


Absolute-value comparator 


Based on the material in Section 12.7, design an absolute-value comparator (that is, a circuit that 
produces a '1' at the output whenever a=b or a=—b). 
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22. 


23. 


24. 


Parallel multiplier operation 


Verify the operation of the array multiplier of Figure 12.20. Apply a="0101" (=5) and b="1101" 
(=13) to the circuit, then follow each signal and check whether the proper result occurs (that is, 
5x 13=65). 


Parallel-serial multiplier operation 


Verify the operation of the parallel-serial multiplier of Figure 12.22. Consider the case of 3 bits, as 
in Figure 12.23. Apply a="011" (=3) and b="101" (=5) to the circuit, then follow the signals at each 
node during six clock periods and check whether the proper result occurs (that is, 3 x5=15). 


Parallel-serial multiplier timing diagram 


In continuation to the exercise above, assume that the adder is operating near its maximum fre- 
quency, so internal propagation delays must be taken into consideration (as in Example 12.6). Using 
the simplified timing diagram style seen in Figure 4.8(b), draw the waveforms at all adder’s nodes 
(as in Figure 12.24). Adopt a clock period of 30ns and the following propagation delays: 

Through the AND gates: ft, = Ins. 
Through the full-adder units: t cary 
Through the flip-flops: t,cg=2ns. 


=3ns for the carry and ft, 44m =4ns for the sum. 


12.12 Exercises with VHDL 


See Chapter 21, Section 21.6. 


12.13 Exercises with SPICE 


See Chapter 25, Section 25.14. 
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Registers 


Objective: Complex large designs are normally synchronous, with sequential circuits generally 
accounting for a large portion of the system. To construct them, registers are needed. For that reason, the 
discussion on sequential circuits, which spans Chapters 13-15, starts with the study of registers. Such 
units can be separated into two kinds, called latches and flip-flops. The former can be further divided into 
SR and D latches, while the latter can be subdivided into SR, D, T, and JK flip-flops. All six are studied in 
this chapter, but special attention is given to the D latch and to the D flip-flop, because they are respon- 
sible for almost the totality of register-based applications. 


Chapter Contents 


13.1 Sequential versus Combinational Logic 
13.2. SR Latch 

13.3. D Latch 

13.4 D Flip-Flop 

13.5 Master-Slave D Flip-Flops 

13.6 Pulse-Based D Flip-Flops 

13.7 Dual-Edge D Flip-Flops 

13.8 Statistically Low-Power D Flip-Flops 
13.9 D Flip-Flop Control Ports 

13.10 T Flip-Flop 

13.11 Exercises 

13.12 Exercises with SPICE 


13.1. Sequential versus Combinational Logic 


As described in Section 11.1, a combinational logic circuit is one in which the outputs depend solely 
on the current inputs. Thus the system is memoryless and has no feedback loops, as in the model of 
Figure 13.1(a). In contrast, a sequential logic circuit is one in which the outputs do depend on previ- 
ous system states, so storage elements are necessary, as well as a clock signal that is responsible for 
controlling the system evolution. In this case, the system can be modeled as in Figure 13.1(b), where a 
feedback loop, containing the storage elements, can be observed. 

The storage capability in sequential circuits is normally achieved by means of flip-flops. Depending on 
their functionalities, such storage elements (globally referred to as registers) can be classified in one of the 
following four categories: D (data), T (toggle), SR (set-reset), or JK (Jack Kilby) flip-flops. However, mod- 
ern designs employ almost exclusively the D-type flip-flop (DFF), with the T-type flip-flop (TFF) coming 
in second place but way behind DFF. TFFs are used mainly in the implementation of counters, while DFFs 
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FIGURE 13.1. (a) Combinational versus (b) sequential logic. 


are general purpose flip-flops spanning a much wider range of applications, including counters, because 
a DFF can be easily converted into a TFF. For example, DFFs are normally the only kind of register pre- 
fabricated in programmable logic devices (CPLDs and FPGAs, Chapter 18). DFFs and TFFs are studied in 
detail in this chapter, while the other two are seen in the exercises section (Exercises 13.4-13.5). 

Latches constitute another popular kind of storage cell. They can be divided into two groups called 
D (data) and SR (set-reset) latches. Like the D-type flip-flop (DFF), the D-type latch (DL) finds many 
more applications than the SR latch (SRL). Both are studied in this chapter. 


13.2 SRLatch 


An SR latch (SRL) is a memory circuit with two inputs, called s (set) and r (reset), and two outputs, 
called g and gq’. When s='1' occurs, it “sets” the output, that is, forces g to 'l' and, consequently, q’ to '0'. 
Likewise, when r='l' occurs, it “resets” the circuit, that is, it forces q to '0' and, therefore, q' to '1'. These 
inputs are not supposed to be asserted simultaneously. 

Two SRL implementations are depicted in Figure 13.2. In Figure 13.2(a), a NOR-based circuit is shown 
along with its truth table (where q* represents the next state of q), symbol, and corresponding CMOS 
circuit (review the implementation of NOR gates in Section 4.4). A NAND-based implementation is 
depicted in Figure 13.2(b), again accompanied by the respective truth table, symbol, and CMOS circuit 
(see CMOS NAND in Section 4.3). Note that the former operates with regular input-output values (as 
defined above), while the latter operates with inverted values (that is, a '0' is used to set or reset the cir- 
cuit), hence having its inputs represented by s’ and r’ instead of s and r. 

Contrary to the circuits above, which are asynchronous, a clocked (or gated) version is presented 
in Figure 13.3. A NAND-based implementation is depicted in Figure 13.3(a). When clk='0', the latch 
remains in the “hold” state, whereas clk='1' causes it to operate as a regular SR latch. This means that if 
set or reset is asserted while the clock is low, it will have to wait until the clock is raised for the output 
to be affected. The corresponding truth table and symbol are shown in Figures 13.3(b)-(c). Another 
circuit for this same function, but requiring less transistors, is presented in Figure 13.3(d) (7 transistors 
in Figure 13.3(d) against 16 in Figure 13.3(a)). 


13.3  DLatch 
13.3.1 DL Operation 


A D-type latch (DL) is a level-sensitive memory circuit that is “transparent” while the clock is high and 
“opaque” (latched) when it is low or vice versa. Corresponding symbols and truth tables are depicted 
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FIGURE 13.2. Asynchronous SR latch implementations: (a) NOR-based; (b) NAND-based. In each case the 
corresponding truth table, symbol, and CMOS circuit are also shown. 
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FIGURE 13.3. Synchronous SR latch: (a) NAND-based implementation, (b) truth table, (c) symbol, and 
(d) SSTC-based implementation (studied in Section 13.3). 


(b) 


FIGURE 13.4. Symbol and truth table for (a) positive- and (b) negative-level DLs. 


in Figures 13.4(a)-(b). The circuit has two inputs, called d (data) and clk (clock), and two outputs, q and 
its complement, q’. The DL in Figure 13.4(a) is transparent (that is, q=d) when clk='1', so it is said to 
be a positive-level DL (or, simply, positive DL), whereas that in Figure 13.4(b) is transparent when clk='0' 
(denoted by the little circle at the clock input), so it is known as negative-level DL (or, simply, negative DL). 
When a DL is transparent, its output is a copy of the input (q=d), while in the opaque state the output 
remains in the same state that it was just before the clock changed its condition (represented by q*=q in 
the truth tables, where q* indicates the DL’s next state). 
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MM EXAMPLE 13.1 DL FUNCTIONAL ANALYSIS 


Figure 13.5(a) shows a positive DL. Assuming that it is submitted to the clk and d signals shown in 
Figure 13.5(b), draw the corresponding output waveform, q. Consider that the internal propagation 
delays are negligible (functional analysis). 


oat Lay ok nes 


(a) 


FIGURE 13.5. (a) D latch; (b) Functional analysis of Example 13.1; (c) Timing analysis of Example 13.2. 


SOLUTION 


The last plot of Figure 13.5(b) shows g. The DL’s initial state was assumed to be g='0'. During the 
semiperiods (1) and (3) the clock is low, so the circuit is opaque and retains its previous state, regard- 
less of what happens with d. In (2) and (4) the DL is transparent, hence q is a copy of d (in this 
example the propagation delays were omitted; see Example 13.2 for a more realistic timing analysis, 
which is plotted in Figure 13.5(c)). 


13.3.2 Time-Related Parameters 


Sequential circuits are clocked, so the clock period becomes a very important parameter because any 
sequential unit must complete its operations within a certain time window. DLs are sequential, so sev- 
eral time-related parameters must be specified to characterize their performance. These parameters fall 
into three categories, called contamination delays, propagation delays, and data-stable requirements. The four 
most important parameters, which belong to the last two categories, are described below and are illus- 
trated in Figure 13.6 (for a positive-level DL). 


tycq (propagation delay from clk to q): This is the time needed for a value that is already present in d 
to reach gq when the latch becomes transparent (that is, when the clock is raised if it is a positive DL). 
Depending on the circuit, the low-to-high and high-to-low values of t,,cg can be different. 

typq (propagation delay from d to q): This is the time needed for a change in d to reach q while the latch 
is transparent (that is, while the clock is high if it is a positive DL). Depending on the circuit, the low- 
to-high and high-to-low values of tpg can be different. 
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FIGURE 13.6. Main DL time parameters. 


Bouin (setup time): This is the time during which the input (d) must remain stable before the latch 
becomes opaque (that is, before the clock is lowered if it is a positive DL). 


thoig (hold time): This is the time during which the input (@) must remain stable after the latch becomes 
opaque (that is, after the clock is lowered if it is a positive DL). 


Note: Even though combinational circuits are not clocked, more often than not they are part of sequen- 
tial systems, so determining their propagation and contamination delays is indispensable (timing dia- 
grams for combinational circuits were introduced in Section 4.2—see Figure 4.8—and used extensively 
in Chapters 11 and 12). 


MM EXAMPLE 13.2 DL TIMING ANALYSIS 


Consider again the DL of Figure 13.5(a), for which the output waveform must be drawn. However, 
assume that the circuit is operating near its maximum frequency, so its propagation delays must be 
taken into account. Draw the waveform for q knowing that t,cg=2ns and t,pg=3ns. Assume that 
the clock period is 20ns and that the DL’s initial state is q='0'. To draw the waveform, adopt the 
simplified timing diagram style seen in Figure 4.8(b). 


SOLUTION 


The solution is depicted in Figure 13.5(c). Gray shades were employed to highlight the propagation 
delays (2ns and 3ns). @ 


13.3.3 DL Circuits 


A series of D-type latches are examined below, which were divided into two main groups, called static 
(can hold its state indefinitely as long as the power is not turned off) and dynamic (requires periodic 
refreshing). The static latches are further divided into three subgroups, called multiplexer-based, RAM-type, 
and current-mode DLs. The complete list is shown in the table of Figure 13.7, which contains also, in the last 
column, the respective references. 
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FIGURE 13.7. List of DLs that are examined in Section 13.3. 


13.3.4 Static Multiplexer-Based DLs 


The SOP (sum-of-products, Section 5.3) expression for any DL can be easily derived from the truth tables 
in Figure 13.4. For example, for the DL of Figure 13.4(a), q=clk’ -q+clk-d results (notice that this function 
is recursive, which is proper of sequential circuits—see Exercise 11.1). Because this expression is similar 
to that of a multiplexer (Section 11.6), it immediately suggests an approach (called mux-based) for the 
construction of DLs. 

Several static multiplexer-based DLs are depicted in Figures 13.8(a)-(d). In Figure 13.8(a), a generic 
multiplexer is shown, with d feeding one input and q feeding the other, thus producing q=q when clk='0' 
and qg=d when clk='1'. 

In Figure 13.8(b), the STG (static transmission-gate-based) latch is shown. Here the multiplexer 
was constructed with TGs (transmission-gates—see the dark area in the first part of Figure 13.8(b)) 
surrounded by inverters operating as buffers. In the center, the circuit was reorganized to show its most 
common representation. Finally, a detailed schematic is shown on the right. When clk='1', the input 
switch is closed and the loop switch is open, causing the circuit to be transparent (thus g=d). When 
clk='0', the input switch is open and the loop switch is closed, so the ring formed by the two back-to-back 
inverters holds the previous value of d (that is, q=q). If the q’ output is used, noise from that node can 
reach the input node through the feedback path. To increase noise immunity, the extra inverter (drawn 
with dashed lines) can be used to produce q’. This latch is relatively compact and fast, though it requires 
both clock phases (clk and clk’) and has a total of four clocked transistors, which constitute a relatively 
large clock load. 

Another static DL is depicted in Figure 13.8(c). The only difference with respect to the latch in 
Figure 13.8(b) is that C*7MOS logic (seen in Section 10.8) is employed in the feedback path instead 
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of an inverter and a TG. This reduces the circuit size slightly with a very small speed reduction. For 
obvious reasons, it is called TG-C?MOS latch. Like the DL in Figure 13.8(b), this too is a popular latch 


implementation. 


Finally, in Figure 13.8(d) the multiplexer (latch) was constructed using conventional NAND gates. 
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FIGURE 13.8. Static positive-level DL implementations. Circuits (a)-(d) are mux-based, (e)-(k) are RAM-type, 
and (I) and (m) are current-mode DLs. (a) Generic mux-based approach; (b) STG (static transmission-gate-based) 
latch (mux constructed with TGs); (c) TG-C?MOS latch (TG used in the forward path, C?MOS logic in the feed- 
back path); (d) With mux constructed using NAND gates; (e) Traditional 6T SRAM cell (Section 16.2); (f) S-STG 
(simplified STG) latch with weak inverter in the feedback path; (g) Jamb latch, used in synchronizers; (h) and (i) 
S-CVSL (static cascode voltage switch logic) latch and its ratio-insensitive counterpart, SRIS (static ratio insensi- 
tive) latch; (j) and (k) SSTC1 and SSTC2 (static single transistor clocked) latches; (I) and (m) ECL and SCL latches 
(very fast DLs implemented with emitter- and source-coupled logic, respectively). 
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13.3.5 Static RAM-Type DLs 


In the transmission-gate latches seen above, a switched ring of inverters constitutes the memory. RAM- 
type latches are based on the same principle. However, the ring is permanently closed (that is, there is 
no switch to open or close it), which reduces the size of the hardware but makes writing to the memory 
a little more difficult due to the contention that occurs between the input signal and the stored signal 
when their values differ. 

Several RAM-type latches are presented in Figures 13.8(e)-(k). Again, all examples are positive-level 
latches. However, contrary to the TG-based latches seen above, except for the circuit in Figure 13.8(f), all 
new circuits are single phase (that is, clk’ is not needed). 

The DL in Figure 13.8(e) is the classical 6T SRAM cell (discussed in Section 16.2—see Figure 16.1), 
with the ring of inverters (permanently closed) shown in the center. It is a very simple circuit, though it 
requires d and d’ (differential inputs) to minimize the effect caused by the inexistence of a loop switch. 
Both sides of the ring are active, that is, when clk='1' both sides are effectively connected to data signals, 
causing the transitions to be fast. As can be seen, this latch requires only one clock phase. 

A simplified version of the STG latch (referred to as S-STG) is shown in Figure 13.8(f). It is similar 
to the STG latch but without the loop switch, thus rendering a more compact (but slower) circuit. 
The feedback inverter is weaker than the forward one in order to reduce the contention between d 
and the stored value. The input switch can be a single transistor (in which case only one clock phase 
is necessary) or, to improve speed, a TG (a dual-phase clock is then needed). As will be shown later, 
the ring of inverters (without the input switch) is often used in flip-flops to staticize them, so is 
sometimes referred to as staticizer. 

The circuit in Figure 13.8(g) is known as jamb latch (indeed, all latches that operate with a ring of 
inverters fall somehow in this category). The jamb latch is normally employed in the implementation 
of flip-flops for synchronizing circuits. Contrary to the previous and next latches, all transistors in the 
cross-connected inverters are normally designed with the same size. As can be seen, data (d) are applied 
to only one side (the clocked one, so it is synchronous) of the memory ring, while reset (asynchronous) 
is connected to the other. In this case, a '0' that is preceded by a '1' can only be written into this DL by a 
reset pulse. 

We will describe later (Figure 13.9(e)) a dynamic latch that employs a kind of logic known as CVSL 
(cascode voltage switch logic), so is referred to as CVSL latch. Even though static CVSL does not exist, the 
latch of Figure 13.8(h) will be referred to as S-CVSL (static CVSL) because it is precisely the static coun- 
terpart of the true CVSL latch. Like the previous latch, this too is a compact single-phase circuit. Note 
that, for rail-to-rail input voltages, only one side of the ring is actually active (that with d or d' high). To 
reduce contention (and preserve speed), the nMOS transistors in the cross-connected inverters are made 
weaker than the pMOS ones (they serve simply to staticize the circuit). The other nMOS transistors, 
however, must be stronger than the pMOS. 

The circuit in Figure 13.8(i) is known as SRIS (static ratio insensitive) latch. It is the ratio-insensitive 
version of that in Figure 13.8(h), that is, in this case the nMOS transistors (in the vertical branches) do not 
need to be stronger than the pMOS ones. This is achieved with the addition of three p-type transistors 
in the circuit’s upper part. Notice that this is a positive-level DL. To obtain its negative counterpart, the 
pMOS transistors must be replaced with nMOS ones, and vice versa, and GND must be interchanged 
with Vpp. In this new circuit it is the pMOS transistors that must be stronger than the nMOS ones, which 
is more difficult to accomplish than the other way around because nMOS is inherently about 2.5 to 3 
times stronger than pMOS (for same-size transistors—Sections 9.2—9.3). It is in this kind of situation 
that the SRIS approach is helpful. However, it requires more silicon space than that in Figure 13.8(e), for 
example, with no relevant performance advantage in rail-to-rail applications. 
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The latch in Figure 13.8(j) is similar to that in Figure 13.8(h) but with the clocked transistors merged 
to reduce clock loading. Because it contains only one clocked transistor, it is called SSTC1 (static 
single transistor clocked 1) latch. If operating with rail-to-rail voltages, it does not tolerate large data 
skew (delay between d and d’) or slow data transitions because then both inputs might be momen- 
tarily high at the same time, causing a brief short circuit over the inverters, which might corrupt the 
stored values. Another version (called SSTC2) of this latch is presented in Figure 13.8(k). Like all the 
other latches in Figure 13.8, this too is a positive-level DL. As will be seen in Section 13.4, to construct 
a flip-flop two DLs are required (in the master-slave approach), one being positive level and the 
other negative level. As will be shown, the architecture in Figure 13.8(k) eases the construction of the 
negative-level DL. 


13.3.6 Static Current-Mode DLs 


Finally, Figures 13.8(1)-(m) show two DLs that do not employ a ring of cross-connected inverters 
to store information. The two circuits are indeed similar, but while one employs bipolar transis- 
tors (npn only, which are faster—Section 8.2), the other uses MOSFETs (nMOS only, also faster— 
Section 9.2). The former employs emitter-coupled logic (ECL), so it is known as ECL latch. For a 
similar reason, because it employs source-coupled logic (SCL), the latter is referred to as SCL latch. 
These DLs operate with two differential amplifiers, one at the input, which is active when clk ='1', 
and the other at the output (memory), which is active when clk='0'. They are the fastest structures 
currently in use, and when the transistors are fabricated using advanced techniques like those 
described in Sections 8.7 and 9.8 (GaAs, SiGe, SOI, strained silicon, etc.), flip-flops for prescalers 
(Section 14.6) operating with input frequencies over 30GHz with SCL [Sanduleanu05, Kromer06, 
Heydari06] or over 50 GHz with ECL [Griffith06, Wang06] can be constructed. Their main drawbacks 
are their relatively high power consumption (because of the static bias current) and the wide silicon 
space needed to construct them. 


13.3.7 Dynamic DLs 


All DLs in Figure 13.8 are static (that is, they retain data even if the clock stops as long as the power is not 
turned off). To save space and increase speed, in many designs dynamic latches are preferred. The main 
drawback of dynamic circuits is that they need to be refreshed periodically (typically every few milli- 
seconds) because they rely on charge stored onto very small (parasitic) capacitors. Refreshing demands 
additional power and does not allow the circuit to operate with very low clock frequencies, which is 
desirable during standby/sleep mode. 

Several dynamic DLs are depicted in Figure 13.9. The first three are dual-phase circuits, while the 
other four are single phase. All seven are positive-level latches. In the DTG1 (dynamic transmission- 
gate-based 1) latch, depicted in Figure 13.9(a), the switch is followed by an inverter, so charge is stored 
onto the parasitic capacitor at the inverter’s input. Due to its buffered output, noise from the output 
node is prevented from corrupting the stored charge. Being compact and fast, this is a common dynamic 
DL implementation. 

The dynamic DL of Figure 13.9(b) has the switch preceded by an inverter, so charge is stored at the 
output node. Because of its buffered input and unbuffered output, only input noise is prevented from 
corrupting the stored charge. On the other hand, it allows tri-state (high-impedance, or floating-node) 
operation. Like the latch above, this circuit is also very compact and fast. Because it is dynamic and is 
also constructed with a TG, it is referred to as DTG2 latch. 
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FIGURE 13.9. Dynamic positive-level DL implementations: (a) DTG1 (dynamic TG-based 1) latch with unbuf- 
fered input and buffered output; (b) DTG2 latch with buffered input and unbuffered output, allowing tri- 
state operation; (c) C*MOS latch, similar to (b), but with C?MOS logic instead of TG logic; (d) TSPC (true single 
phase clock) latch; (e) CVSL (cascode voltage switch logic) latch, which is the dynamic counterpart of that in 
Figure 13.8(h); (f) and (g) DSTC1 and DSTC2 (dynamic single transistor clocked) latches, which are the dynamic 
counterparts of those in Figures 13.8(j) and (k). 


The next DL (Figure 13.9(c)) is equivalent to that above, but it employs CMOS logic (Section 10.8) to 
construct the inverter plus the switch (hence it is called C7MOS latch). It renders a slightly more compact 
circuit because the physical connection at the node between the inverter and the switch is split horizon- 
tally, diminishing the number of electrical contacts. However, due to the two transistors in series in each 
branch, it is slightly slower. 

The circuit in Figure 13.9(d) is known as TSPC (true single phase clock) latch, and it was one of the first 
latches to operate with a single clock phase. Like the previous latch, it too is dynamic and allows tri-state 
operation. Several variations of this structure exist, some of which are appropriate for the construction 
of relatively high-speed flip-flops, as will be shown in Section 14.6. 

The next latch (Figure 13.9(e)) employs a type of logic called CVSL (cascode voltage switch logic), so it is 
known as CVSL latch (its static counterpart was seen in Figure 13.8(h), from which the nMOS transistors of 
the cross-connected inverters were removed). The same observations made there are valid here. An extension 
of that architecture is depicted in Figure 13.9(f), which is called DSTC1 (dynamic single transistor clocked 1) 
latch. In it, the clocked transistors were merged to reduce the clock load. This circuit is the dynamic counter- 
part of the static SSTC1 latch seen in Figure 13.8(j), so here too the same observations apply. 

Finally, Figure 13.9(g) presents the DSTC2 (dynamic single transistor clocked 2) latch, which is 
the dynamic counterpart of the static SSTC2 latch seen in Figure 13.8(k). Like all the other latches in 
Figure 13.9, this too is a positive-level DL. As will be seen in Section 13.4, to construct a flip-flop two 
DLs are required (in the master-slave approach), one being positive-level and the other negative- 
level. The architecture in Figure 13.9(g) eases the construction of the negative-level DL. 
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(b) 


FIGURE 13.10. Symbol and truth table for (a) a positive-edge DFF and (b) a negative-edge DFF. 


13.4 D Flip-Flop 


As mentioned in the introduction of this chapter, D-type flip-flops (DFFs) are the most commonly used 
of all registers because they are perfectly suited for the construction of sequential circuits. Due to their 
importance, a detailed analysis is presented in this chapter, which is divided into several sections as 
follows: 


m Fundamental DFF concepts (Section 13.4) 

m@ Analysis of DFFs constructed using the master-slave technique (Section 13.5) 
m@ Analysis of DFFs constructed using the short-pulse technique (Section 13.6) 
m Analysis of dual-edge DFFs (Section 13.7) 

m Analysis of statistically low-power DFFs (Section 13.8) 

a 


Introduction of DFF control ports—treset, preset, enable, clear (Section 13.9) 


13.4.1 DFF Operation 


Differently from latches, which are level sensitive, flip-flops are edge sensitive. In other words, while a D 
latch is transparent during a whole semiperiod of the clock, a D flip-flop is transparent only during one 
of the clock transitions (either up or down). If the DFF transfers the input value to the output during 
the clock’s rising transition it is said to be a positive-edge triggered D flip-flop (or simply positive-edge DFF); 
otherwise, it is said to be a negative-edge triggered D flip-flop (or simply negative-edge DFF). Corresponding 
symbols and truth tables are depicted in Figure 13.10. 


MM EXAMPLE 13.3 DFF FUNCTIONAL ANALYSIS 


Figure 13.11(a) shows a positive-edge DFF to which the clk, rst, and d waveforms of Figure 13.11(b) 
are applied. Assuming that the propagation delay through the flip-flop is negligible (functional anal- 
ysis), draw the waveform for q. 


SOLUTION 


The last plot of Figure 13.11(b) shows q. The DFF’s initial state was assumed to be q='0'. Every time 
a positive clock edge occurs, the value of d is copied to q without any delay. When reset is asserted, 
the output is lowered immediately and asynchronously. 
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FIGURE 13.11. (a) Positive-edge DFF; (b) Functional analysis of Example 13.3; (c) Timing analysis of 
Example 13.4. Oo 


13.4.2 Time-Related Parameters 


DFFs are the most common sequencing units. Like DLs, DFFs are also characterized by a series of time 
parameters, which again fall into three categories, called contamination delays, propagation delays, and 
data-stable requirements. The three most important parameters, which belong to the last two categories, 
are described below and are illustrated in Figure 13.12 (for a positive-edge DFF). 


t.cg (propagation delay from clk to q): This is the time needed for a value present in d to reach q when 
the clock is raised. Depending on the circuit, the low-to-high and high-to-low values of t,cg can be 
different. 

tsetup (setup time): This is the time during which the input (d) must remain stable before the clock is 
raised. 

thoiq (hold time): This is the time during which the input (d) must remain stable after the clock is 
raised. 

Another important parameter is the actual width of the transparency window. Though theoretically 


near zero, depending on the flip-flop implementation it might be relatively large. This is particularly true 
when a technique called pulsed latch (described later) is employed to construct the DFF. The consequence 
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FIGURE 13.12. Main DFF time parameters. 
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(a) 


FIGURE 13.13. Flip-flops connected (a) directly and (b) with combinational logic in between. 


of a too large window can be observed with the help of Figure 13.13. In Figure 13.13(a), two DFFs are 
interconnected directly without any combinational logic in between. If the transparency window is too 
large, the voltage transferred from d to x might have time to propagate through the second DFF as well 
(from x to q), causing an incorrect operation. This is less likely to happen in Figure 13.13(b) due to the 
extra delay from x to y caused by the combinational logic block. On the other hand, if the window is 
too short, d might not have time to reach x, which also would cause an error. To make things even more 
difficult, the transparency window in a pulsed latch varies with temperature and process parameters. In 
summary, careful and detailed timing analysis is indispensable during the design of sequential systems. 
In some cases, race-through is prevented by intentionally installing a block referred to as deracer (illus- 
trated later) between adjacent flip-flops. 

The use of some of the time-related parameters described above is illustrated in the example that 
follows. 


MM EXAMPLE 13.4 DFF TIMING ANALYSIS 


Consider again the DFF of Figure 13.11, for which the output waveform must be drawn. However, 
assume that the circuit is operating near its maximum frequency, so its propagation delays must be 
taken into account. Assume that the propagation delay from clk to q is thcg=2ns and from rst to q is 
tsrq= Ins, and that the clock period is 10ns. Consider that the DFF’s initial state is q='0' and adopt 
the simplified timing diagram style seen in Figure 4.8(b) to draw the waveform for q. 


SOLUTION 


Figure 13.11(c) shows q. Gray shades were employed to highlight the propagation delays (1ns 
and 2ns). 


13.4.3 DFF Construction Approaches 


Figure 13.14 shows two classical approaches to the construction of DFFs. The architecture in 
Figure 13.14(a) is called master-slave, and it consists of two DLs connected in series with one trans- 
parent during one semiperiod of the clock and the other transparent during the other semiperiod. 
In Figure 13.14(a), the master DL is transparent when clk='0' (indicated by the little circle at the 
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FIGURE 13.14. Construction techniques for D-type flip-flops: (a) With master-slave DLs; (b) With a pulsed DL. 
DFF symbols for either case are depicted in (c). 


clock input), while the slave DL is transparent when clk='1', so the resulting DFF is a positive-edge 
DFF (the master’s output is copied to the actual output, q, by the slave when the latter becomes 
transparent). 

A different construction technique is depicted in Figure 13.14(b), which employs only one DL. Its 
operation is based on a short pulse (represented by ¢ in the figure), derived from the clock, that causes 
the latch to be transparent during only a brief moment. This approach is called pulsed latch or pulse-based 
flip-flop. Its main advantages are the reduced circuit size and the possibility to operate with negative 
setup times. On the other hand, the hold time is larger than that of a true (master-slave) DFF, which, as 
seen in Figure 13.13, might cause errors, so it must be carefully designed. 


13.4.4 DFF Circuits 


Several flip-flop implementations, including both architectures of Figure 13.14, will be examined in the 
sections that follow. The complete list is shown in the table of Figure 13.15, which contains also, in the 
last column, the respective references. 


13.5 Master-Slave D Flip-Flops 


As shown in Figure 13.14, master-slave is one of the techniques used to implement DFFs. Its main advan- 
tage over the other (pulse-based) technique is that the DFF’s timing behavior is simpler (safer). On the 
other hand, pulse-based DFFs tend to be more compact and might also consume less power. The former 
are examined in this section, while the latter are seen in the next. 


13.5.1 Classical Master-Slave DFFs 


Figure 13.16 shows four classical DFFs, all implemented using the master-slave approach (Figure 13.14(a)) 
and MOS technology (Sections 10.5-10.8). Their fundamental components are transmission-gates (TGs), 
inverters, and tri-state buffers (all seen in Figure 11.2). 
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i 
Fenris 
Hs 

(Gerosa04} 


TSPC True Single Phase Clock flip-flop 13.17(a) [Yuan89] 
E-TSPC Enhanced TSPC flip-flop 13.17(b) [Huang96] 
GF-TSCP Glitch-Free TSPC flip-flop 13.17(c) [Huang96] 
DSTC Dynamic Single Transistor Clocked flip-flop 13.17(d) [Yuan97] 
Special SSTC Static Single Transistor Clocked flip-flop 13.17(e) [Yuan97] 


master-slave SAFF Sense-Amplifier-based flip-flop 13.17(f) 
DFFs StrongARM Flip-flop used in the StrongARM 110 processor 13.17(f) [Montanaro96] 
Nikolic SAFF Modified Nikolic SAFF flip-flop 13.17(f) [Nikolic00} 


Modified Strollo SAFF flip-flop 
Emitter-Coupled-Logic-based flip-flop [Knapp01, Wang06} 
SCL Source-Coupled-Logic-based flip-flop 13.17(h) [Heydani06, Kromer06] 
S-SCL Simplified SCL flip-flop 13.17(h) [Shu02, Peng07] 
EP/TG-C*MOS Expl.-Pulsed TG-C*MOS-based flip-flop (NEC RISC proc.) | 13.19(a) [Kozu96] 
HLFF/Partovi Hybrid Latch flip-flop or Partovi flip-flop (K6 proc.) 13.19(b) [Partovi96] 
Pulse-based K6 Self-resetting flip-flop also used in the AMD K6 proc. 13.19(c) [Draper97] 
DFFs SDFF/Klass Semi-Dynamic flip-flop or Klass flip-flop 13.19(d) [Klass99] 
DSETL Dual-rail Static Edge-Triggered Latch 
[Naffziger02, 05) 
Mux-based dual-edge flip-flop 
puaedde 
DFFs PIP-SRAM | ImpliciPulsed SRAM-based dual-edge fipflop | 13.20(c) | (Moisiadis01] 
[Ghadiri05} 


Statistically GCFF Gated-Clock flip-flop 13.21(a) [Strollo00} 
reduced power | CCFF Conditional-Capture flip-flop 13.21(b) [Kong01] 
DFFs DCCER Dif. Conditional-Capture Energy-Recovery flip-flop 13.21(c) [Cooke03] 


DESPFF Dual-Edge-triggered Static Pulsed flip-flop 13.20(e) 


FIGURE 13.15. List of DFFs examined in Sections 13.5-13.8. 


The first circuit (Figure 13.16(a)) is dynamic and is obtained by cascading two dynamic DLs of 
Figure 13.9(a). An extra inverter can be placed at the input to reduce noise sensitivity. Because of the use 
of TGs (transmission-gates), it is called DTG (dynamic TG-based) flip-flop. When c/k='0', the first TG is 
closed and the second is open, so the master DL is transparent, causing node X to be charged with the 
voltage of d, whereas the slave DL is opaque, hence resulting at the output in a fixed voltage determined 
by the voltage previously stored on node Y. When clk transitions to '1', it opens the first TG, turning the 
master DL opaque, while the second TG closes and copies the voltage of X to Y (inverted, of course), 
which in turn is transferred to the output by the second inverter. In other words, the input is copied to 
the output at the positive edge of the clock. Ideally, when clk returns to '0', no path between d and q (that 
is, no transparency) should exist. However, that is not necessarily the case here because this circuit is 
sensitive to clock skew (see comments on clock skew ahead). This circuit is also sensitive to slow clock 
transitions, so careful timing analysis is indispensable. 

Figure 13.16(b) shows another conventional dynamic DFF, which employs the C*MOS latch of 
Figure 13.9(c) to construct its master-slave structure. This flip-flop, contrary to that described above, 
is not susceptible to clock skew (this also will be explained ahead). However, due to the two transis- 
tors in series in each branch, it is slightly slower. 
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FIGURE 13.16. Conventional master-slave positive-edge DFFs: (a) DTG (dynamic TG-based) flip-flop, which 
employs the latch of Figure 13.9(a); (b) Dynamic C2MOS flip-flop, which uses the latch of Figure 13.9(c) 
(two equivalent representations are shown); (c) STG (static TG-based) flip-flop, which employs the latch of 
Figure 13.8(b); (d) Static TG-C?7MOS-based flip-flop, using the latch of Figure 13.8(0); In (e), three alternatives 
against clock skew and slow clock transitions are shown. All inverters are regular CMOS inverters. 


One of the drawbacks of dynamic circuits is that they need to be refreshed periodically (every few 
milliseconds), which demands power. Due to their floating nodes, they are also more susceptible to 
noise than their static counterparts. On the other hand, static circuits are normally a little slower and 
require more silicon space. The DFF of Figure 13.16(c) is precisely the static counterpart of that seen in 
Figure 13.16(a), and it results from the association of two static DLs of Figure 13.8(b). Again, an extra 
inverter can be placed at its input to reduce noise sensitivity. For obvious reasons, it is called STG (static 
transmission-gate-based) flip-flop, and it is also a common DFF implementation. 

Finally, Figure 13.16(d) shows the TG-C?MOS flip-flop, which employs the static DL of Figure 13.8(c) 
to build its master-slave structure. The circuit contains TGs in the forward path and C*MOS gates in the 
feedback path. The latter reduces the circuit size slightly with a negligible impact on speed. This DFF 
was employed in the PowerPC 603 microprocessor. 


13.5.2 Clock Skew and Slow Clock Transitions 


Flip-flops are subject to two major problems: Clock skew and slow clock transitions. Clock skew effects can 
occur when the clock reaches one section of the circuit much earlier than it reaches others, or, in dual- 
phase circuits, when clk and clk’ are too delayed with respect to each other. Slow clock transition effects, 
on the other hand, can occur when the clock edges are not sharp enough. The main consequence of both 
is that a section of the circuit might be turned ON while others, that were supposed to be OFF, are still 
partially ON, causing internal signal races that might lead to incorrect operation. 
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To illustrate the effect of clock skew, let us examine the DFF of Figure 13.16(a). When clk changes from 
'0' to '1', the input is copied to the output. However, when clk changes from '1' to '0', we want the circuit 
to remain opaque. If the delay (skew) between clk and clk’ is significant, clk='0' might occur while clk’ is 
still low, hence causing the pMOS transistor in the slave TG to still be ON when the pMOS transistor of 
the master TG is turned ON. This creates a momentary path between d and q (undesired transparency) 
at the falling edge of the clock (incorrect operation). 

Contrary to the other three flip-flops in Figure 13.16, the dynamic C?MOS circuit in Figure 13.16(b) is 
not susceptible to clock skew. This is due to the fact that in the latter the input signal is inverted in each 
stage, so no path between d and q can exist when clk and clk’ are both momentarily low (note that when 
clk=clk' ='0' only the upper part of each DLis turned ON, so q cannot be affected). The other problem (slow 
clock transition), however, is still a concern (it can affect all four flip-flops in Figure 13.16 as well as several 
from Figure 13.17) because it can cause all clocked transistors to be partially ON at the same time. 

Three alternatives against the problems described above are depicted in Figure 13.16(e). The first (and 
guaranteed) solution, shown on the left, is to employ nonoverlapping clocks. This, however, complicates 
clock distribution and reduces the maximum speed, being therefore of little interest for actual designs. 
The second solution (shown in the center) is the most common. Two inverters are constructed along each 
flip-flop, which sharpen the clock edges and are carefully designed to minimize clock skew. Finally, two 
pairs of inverters are shown on the right, which serve to shape the clock waveforms and also to delay 
the master’s clock with respect to the slave’s. Because in this case the slave enters the opaque state before 
the master changes to the transparent state, the momentary transparency during the high-to-low clock 
transition described above can no longer occur. This, however, increases the hold time, plus it demands 
more space and power. 


13.5.3 Special Master-Slave DFFs 


Several high-performance DFFs are depicted in Figure 13.17, all constructed using the master-slave 
approach of Figure 13.14(a). Notice that, except for the last two, all the others are single-phase circuits 
(that is, do not require clk’). 

A dynamic TSPC (true single phase clock) flip-flop is shown in Figure 13.17(a), which is constructed 
using the DL of Figure 13.9(d). Due to internal redundancy, a total of three stages instead of four results 
from the association of two such DLs. The inexistence of clk’ helps the design of high speed circuits 
because it simplifies the clock distribution and also reduces the possibility of skew. The operation of 
this DFF can be summarized as follows. During the precharge phase (clk='0'), node Y is charged to Vpp, 
regardless of the value of d. Y is then conditionally discharged to GND (if d='0') or not (if d='1') during 
the evaluation phase (clk='1'). In the case of d='1', M1 is ON, discharging node X to GND, which turns 
M5 OFF, preventing the discharge of Y when clk changes to '1', so with M7 and M8 ON, q'=OV results. 
On the other hand, in the case of d='0', X is precharged to Vpp (through M2-M3) while clk='0', turning 
M5 ON, which causes Y to be discharged to GND (through M4—M5) when clk='1' occurs, hence raising 
q’ to Vpp (because M7 is OFF and M9 is ON). 

In spite of its skew-free structure, the TSPC DFF of Figure 13.17(a) exhibits a major problem, which 
consists of a glitch at the output when d stays low for more than one clock period. This can be observed 
as follows. As seen above, if d='0', then node Y is discharged towards GND when clk='1' occurs, causing 
q'='1' at the output. It was also seen that during the precharge phase node Y is always precharged to 
Vpp: If in the next cycle d remains low, X remains charged, keeping M5 ON. When clk='1' occurs again, it 
turns M4 and M8 ON simultaneously. However, because Y will necessarily take some time to discharge, 
for a brief moment M7 and M8 will be both ON (until the voltage of Y is lowered below the threshold 
voltage of M7), thus causing a momentary glitch (toward 0V) on q’. A solution for this problem is shown 
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FIGURE 13.17. Special master-slave positive-edge DFF implementations: (a) Dynamic TSPC (true single phase 
clock) flip-flop, constructed with the latch of Figure 13.9(d)—skew free but subject to glitches; (b) E-TSPC is 
an enhanced (faster) version of (a), still subject to glitches; (c) GF-TSPC is a glitch-free version of (b); (d) DSTC 
(dynamic single transistor clocked) flip-flop, constructed with the latches of Figures 13.9(f) and (g); (e) SSTC 
(static single transistor clocked) flip-flop, which employs the latches of Figures 13.8(j) and (k); (f) SAFF (sense- 
amplifier-based flip-flop) or StrongARM flip-flop (with dashed transistor included) and their modified (only 
the slave) versions for higher speeds; (g) The fastest DFF, which is constructed with the ECL (emitter-coupled 
logic) latch of Figure 13.8(I); (h) Similar to (g), but with MOS transistors instead of bipolars (the latches are 
from Figure 13.8(m); a simplified SCL results when the tail currents are suppressed—dashed lines to GND). 


below. However, it is important to observe that if this DFF is used as a divide-by-two circuit (in a counter, 
for example), then the glitch is not a problem (it does not occur), because the value of d changes after 
every clock cycle. In summary, depending on the application, no modifications are needed in the TSPC 
flip-flop of Figure 13.17(a). 
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An enhanced version (E-TSPC) of the DFF above is shown in Figure 13.17(b). In it, the switched 
transistors were moved close to the Vpp and GND rails (M2 interchanged with M3, M8 interchanged 
with M7), resulting in a circuit with less conflicting transistors-sizing requirements and slightly faster. 
It also allows extra gates (for imbedded logic) to be more easily incorporated into the flip-flop, which is 
desirable to reduce propagation delays in high-speed applications. This circuit, however, is subject to the 
same glitch problem described above. 

A glitch-free version (GF-TSPC) is presented in Figure 13.17(c). It requires three extra transistors (two for 
the CMOS inverter, plus one at the output), though they do not introduce significant extra delays because 
the inverter operates in parallel with the second stage, so only the output transistor needs to be considered. 
Due to its compactness, simple clock distribution, low power consumption, and the high performance of 
current (65nm) MOS technology, this, as well as the other TSPC DFFs above, have been used in the imple- 
mentation of prescalers (Section 14.6) with input frequencies in the 5GHz range [Shu02, Ali05, Yu05]. 

Another dynamic flip-flop, called DSTC (dynamic single transistor clocked), is shown in Figure 13.17(d), 
which is constructed with the DSTC1 (slave) and DSTC2 (master) latches of Figures 13.9(f)-(g). Note that 
the master has to be a negative-level DL for the resulting circuit to be a positive-edge DFF. This flip-flop is 
also very compact and needs only one clock phase. However, contrary to the TSPC flip-flops seen above, 
it does not operate using the precharge-evaluate principle, thus saving some power. Another interesting 
feature is that the output transitions of each latch are always high-to-low first, which helps the setup time 
of following DFFs, that is, when a positive clock edge occurs the flip-flop outputs can only remain as they 
are or have the high output go down before the other comes up. Because only a high input can affect any 
succeeding DFFs, the direct interconnection of such units is safe. The flip-flop of Figure 13.17(e) is the static 
counterpart of that in Figure 13.17(d), so it has lower noise sensitivity and does not require refreshing, at 
the expense of size and speed. A final note regards the names of these two flip-flops, which were derived 
from the latches used in their constructions; these have only one clocked transistor, so despite their names, 
the resulting flip-flops have a total of two clocked transistors each. 

Another popular DFF, called SAFF (sense-amplifier-based flip-flop), is depicted in Figure 13.17(f). The 
construction of the master stage is based on a sense amplifier similar to that used in volatile memories. 
Because it operates in differential mode, this circuit is able to detect very small differences (~100mV) 
between d and d’, so it can be used with low-swing (faster) logic. Its operation is based on the precharge- 
evaluate principle. During the precharge phase (clk='0'), both nodes (s’ =set, r' =reset) are raised to Vpp. 
Then in the evaluation phase (c/k='1'), one of the nodes (that with the higher input voltage) is discharged 
towards GND. These values (s’ and r’) are then stored by the slave SR-type latch. Though fast, a major 
consequence of the precharge-evaluate operation is its high power consumption because one of the 
output nodes is always discharged at every clock cycle, even if the input data have not changed. Even 
though the circuit as a whole is static (it can hold data even if the clock stops) because of the fact that the 
master is dynamic and the slave is static, this DFF is normally referred to as semidynamic (this designation 
is extended to basically all circuits whose master latch operates using the precharge-evaluate principle, 
even when a staticizer is added to it). 

The SAFF circuit described above was used in the StrongARM 110 processor but with the addition 
of a permanently ON transistor (M6), and operating with rail-to-rail input voltages. The role of M6 is to 
provide a DC path to GND for the discharged node (s’ or r’). Its operation is as follows. Suppose that 
d='1',so s’ is lowered during the evaluation phase. In this case, a path from s’ to GND exists through 
M1-M3-M5. If, while clk is still high, d changes to '0', no DC path to GND will exist because then M3 
will be OFF. In this case, current leakage (important in present 65nm devices) might charge this floating 
node to a high voltage. Even though this is not a major problem because after evaluation any transition 
from low to high will not affect the information stored in the slave latch, it renders a pseudo-static circuit, 
which is more susceptible to noise. M6 guarantees a permanent connection between the discharged node 
and GND, now through M1-M6-M4—M5. 
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In the original SAFF the slave was implemented using a conventional SR latch (see Slave 1 in 
Figure 13.17 and compare it to Figure 13.2(b)), whose delay is near two NAND-gate delays because q 
depends on q' and vice versa. To reduce it, several improved SR latches were developed subsequently 
and are also depicted in Figure 13.17(f). Notice in Slave 2 that the transistors drawn with solid lines cor- 
respond to the original solution (that is, Slave 1), to which the other transistors (with dashed lines) were 
added. Even though it is not immediately apparent from the circuit, now the values of g and q’ are com- 
puted by q*=s+tr'-qandq*’=r+s' -q' instead of q* =(s'-q')' andq*' =(r' -q)' (where q* is the next state of q). 
In other words, g can be computed without the value of q' being ready, and vice versa, increasing the 
slave’s speed. Notice, however, that this latch requires an extra inverter (extra delay) to compute s and 
r from s' and r’. A slightly simpler solution, which does not require the inverter, is shown in the next SR 
latch (Slave 3). This circuit, however, presents a major problem, which consists of a glitch at the output. 
This can be observed as follows. Suppose that d='0', q='0', and q’='1', so r’ must be lowered during 
evaluation, while s’ remains high, such that no alteration should occur on q and q’. However, clk='1' 
occurs before r’ is lowered, which causes q’ to be momentarily lowered to '0' (see the second half of 
Slave 3) before returning to its previous value ('1'). Note that even if c/k in the slave were delayed the 
problem would persist (then at the other end of the clock pulse). Another limitation of this circuit is the 
presence of staticizers (rings of inverters), which reduce the speed slightly. A faster, glitch-free SR latch 
is presented in Slave 4, which requires neither the inverters nor the staticizers. 

Finally, Figures 13.17(g)-(h) show two similar circuits that employ CML (current-mode logic). As all 
the other DFFs in Figure 13.17, they too are master-slave flip-flops. The first is constructed with bipolar 
transistors and uses the static ECL (emitter-coupled logic) latch of Figure 13.8(1), while the second utilizes 
MOS transistors and the SCL (source-coupled logic) latch of Figure 13.8(m). These circuits operate with 
differential amplifiers and, like the SAFF, are capable of detecting very small voltage differences between 
the inputs, hence allowing operation with small voltage swings, which increases the speed. The large 
stack of (four) transistors (one for Ipras, one for clk, one for d, plus another to implement the resistor, which 
can also be implemented passively), however, makes this structure unsuitable for very low (~1V) supply 
voltages, in which case the tail currents are normally eliminated (indicated by the dashed lines to GND 
in the figure) with some penalty in terms of speed and noise (current spikes); this new structure is called 
S-SCL (simplified SCL). As mentioned in Section 13.3, ECL and SCL are the fastest flip-flops currently in 
use, and when the transistors are fabricated using advanced techniques like those described in Sections 8.7 
and 9.8 (GaAs, SiGe, SOI, strained silicon, etc.), flip-flops for prescalers (Section 14.6) operating with 
input frequencies over 15GHz with S-SCL [Ding05, Sanduleanu05], over 30GHz with SCL [Kromer06, 
Heydari06], or over 50GHz with ECL [Griffith06, Wang06] can be constructed. Their main drawbacks are 
their relatively high power consumption and the relatively wide silicon space needed to construct them. 


13.6 Pulse-Based D Flip-Flops 


The second technique depicted in Figure 13.14 for the construction of DFFs is called pulsed latch or pulse- 
based flip-flop. Its operation is based on a short pulse, derived from the clock, which causes a latch to be 
transparent during only a brief moment, hence behaving approximately as if it were a true flip-flop. 


13.6.1 Short-Pulse Generators 


Some short-pulse generators are depicted in Figure 13.18. The most common case appears in Figure 13.18(a), 
which consists of an AND gate to which clk and x (where x is an inverted and delayed version of clk) are 
applied, giving origin to ¢. The delay relative to the inverters was labeled d1, while that relative to the AND 
was called d2. The resulting waveforms are shown in the accompanying timing diagram. 
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FIGURE 13.18. Short-pulse generators: (a) AND-based, (b) AND-based with feedback, and (c) XOR-based 
(frequency doubler). 


Another short-pulse generator, also based on an AND gate, but having a feedback loop, is depicted 
in Figure 13.18(b). This circuit, employed in the implementation of a NEC RISC microprocessor (see 
EP/TG-C’MOS flip-flop in Figure 13.19(a)), produces relatively larger pulses. Its timing diagram is also 
included in the figure. 

Finally, an XOR-based circuit is shown in Figure 13.18(c). Note that in this case x is a delayed but not 
inverted version of clk. As shown in the accompanying timing diagram, this circuit not only generates 
a short pulse but also doubles the clock frequency, being therefore useful for the implementation of 
dual-edge triggered DFFs. 


13.6.2 Pulse-Based DFFs 


A series of pulse-based flip-flops (also called pulsed latches) are depicted in Figure 13.19. This type of circuit 
is often classified as explicit-pulsed (EP) or implicit-pulsed (IP), depending on whether the pulse generator 
is external to the latch or embedded in it, respectively. The former occupies more space and consumes 
more power, but if shared by several latches it might be advantageous. 

The first circuit in Figure 13.19 was used in a NEC RISC processor. It is a pulsed latch that belongs 
to the EP category. It is simply the TG-C*MOS latch of Figure 13.8(c) to which short pulses are applied 
instead of a regular clock. The short-pulse generator is also shown and corresponds to that previously 
seen in Figure 13.18(b) (with a latch-enable input included). Because this latch is dual-phase, ¢ and ¢' 
are both needed (note the arrangement used to keep these two signals in phase, which consists of two 
inverters for @ and an always-ON TG plus an inverter for ¢’). 

The DFF in Figure 13.19(b) was originally called HLFF (hybrid latch flip-flop), because it is a latch 
behaving as a flip-flop. It has also been called ETL (edge-triggered latch) or Partovi flip-flop. Despite its 
name, it is a regular pulsed latch. The short-pulse generator is built-in, so the circuit belongs to the IP 
category. Its operation is as follows. When clk='0', the NAND produces X='1'. Because M3 and M4 are 
then OFF, q is decoupled from the main circuit, so the staticizer holds its previous data. When clk rises, 
X either goes to '0' (if d='1') or remains high (if d='0'). This situation, however, only lasts until clk has 
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had time to propagate through the trio of inverters, after which X='1' again occurs. This constitutes the 
transparency window (hold time) of the flip-flop. If X is lowered during that window, then M4 is turned 
ON and M2 is turned OFF, causing q='l1' at the output. Otherwise (that is, if X remains high during the 
window), q goes to '0', because then M1, M2, and M3 are ON. Node X is static because it is a CMOS 
NAND (Figure 4.10(b)), while node q is staticized by the pair of cross-connected inverters, so the over- 
all circuit is static. To prevent back-driven noise (from q’), an additional inverter (drawn with dashed 
lines) can be used at the output. Note that this circuit allows operation with negative setup time (that 
is, allows data to arrive after the positive clock transition has occurred). It was originally designed with 
a transparency window of 240 ps and t,cg= 140 ps, resulting in a margin (negative setup time) of ~100 
ps. Though this prevents clock-skew effects and allows soft clock edges, a minimum of three gate delays 
were needed in between two consecutive DFFs (as in Figure 13.13(b)) to prevent race-through. Note that, 
contrary to conventional precharge circuits, this flip-flop has no node that is unconditionally discharged 
at every clock cycle, thus saving power. This DFF was used in the AMD K6 microprocessor. 
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FIGURE 13.19. Pulse-based positive-edge DFF implementations: (a) Explicit-pulsed TG-C2MOS-based flip-flop, 
used in a NEC RISC processor (the latch is from Figure 13.8(c)); (b) HLFF (hybrid latch flip-flop), or Partovi 
flip-flop, used in an AMD K6 processor; (c) Self-resetting DFF also used in an AMD K6 processor; (d) SDFF 
(semi-dynamic flip-flop), or Klass flip-flop; (e) DSETL (dual-rail static edge-triggered latch); (f) EP latch used in 
the Itanium 2 and Itanium Montecito processors (partial scan circuitry included, along with respective pulse 
generator and deracer). 
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Another pulse-based DFF, also employed in an AMD K6 microprocessor, is depicted in Figure 13.19(c). 
However, differently from all the previous circuits, this is a self-resetting DFF. This too is an IP circuit. Its 
operation, based on the precharge-evaluate principle, is as follows. Suppose that clk='0', so M1 is OFF, 
and assume that nodes X and Y (which are staticized by back-to-back inverters) have been precharged 
to Vpp: In this case, g=q'='0', so Z='1' results, which keeps M8-M9 OFF. When the clock rises, it turns 
M1 ON. Because M2-M3 were already ON, the circuit evaluates, pulling X or Y down (depending on 
whether d or d’ is high, respectively). This situation lasts only until the clock propagates through the 
three inverters connected to M2—M3, after which these transistors are turned OFF. This time interval 
constitutes the circuit's transparency window (hold time). When X (or Y) is lowered, q (or q’) is raised, 
causing, after a brief delay (due to the NOR gate plus two inverters), Z='0', which turns M8-M9 ON, 
precharging X and Y back to Vpp, hence self-resetting both q and q’ to zero. After q and q’' are reset, Z 
returns to '1', disabling M8—M9 once again. In other words, only a brief positive pulse occurs on q or q' 
at every clock cycle. 

Note that this circuit, as a whole, is static, because it can hold its state indefinitely if the clock stops. 
However, the original circuit, which operates in precharge-evaluate mode, is dynamic, to which two stat- 
icizers were added. For that reason, as mentioned earlier, this kind of architecture is normally referred 
to as semidynamic. 

The pulsed latch of Figure 13.19(d) is called SDFF (semi-dynamic flip-flop), also known as Klass flip- 
flop. The reason for the “semidynamic” designation is that explained above, being the actual circuit 
static. Its operation is as follows. When clk='0', M1 is OFF and M4 is ON, so X is precharged to Vpp. This 
keeps M7 OFF, and because M6 is also OFF (due to clk='0'), gis decoupled from X and holds the previous 
data. When the clock rises, node X is either discharged (if d='1') or remains high (if d='0'). If X stays high, 
the NAND produces a '0' after the clock propagates through the pair of inverters, turning M3 OFF. This 
guarantees that X will remain high while clk is high even if d changes to '1' after the evaluation window 
has ended. This '1' turns M5 ON, so q='0' is stored by the staticizer. On the other hand, if X is discharged, 
as soon as it is lowered below the NAND’s threshold voltage it forces M3 to continue ON, even if the 
evaluation window ends, so M3 will complete the discharge of X. This allows the transparency window 
(hold time) to be shortened. X='0' turns M7 ON and M5 OFF, so q='1' is stored by the staticizer. Note 
that this circuit also allows negative setup time, but only for rising inputs. Therefore, though slightly 
faster than HLFF, this DFF is more susceptible to clock skew. 

Another fast and compact pulsed latch is depicted in Figure 13.19(e), originally called DSETL (dual- 
rail static edge-triggered latch), which also belongs to the IP category. As before, the short pulses are 
generated by a built-in AND-type gate (transistors M1 to M4). This DFF does not operate using the 
precharge-evaluate principle (important for power saving) and accepts negative setup times. During the 
transparency window (hold time), X and Y are connected to d and d’, whose values are transferred to q 
and q’, respectively. Each output is staticized by an inverter plus an nMOS transistor. Note that this is 
equivalent to the two-inverter staticizer seen before. (This can be observed as follows: when X='0', g too 
is low, so on the right-hand side M6 is ON and M8 is OFF; when X='1', q is high, so this time M6 is OFF 
and M8 is ON; in summary, M6 and M8 perform the function of the missing cross-connected inverter on 
the right, while M5 and M7 do the same on the left.) Observe that the general architecture of this circuit 
is RAM-type with a structure similar to the SRAM cell of Figure 13.8(e), just with short pulses applied to 
the gating transistors instead of a regular clock. 

Finally, Figure 13.19(f) shows a pulsed latch employed in the Itanium 2 microprocessor and, sub- 
sequently, in the Itanium Montecito microprocessor (with some changes in the peripheral circuitry). 
This latch is very simple. Note that, apart from the scan circuitry (which is intended for testability), 
with Sh'='1' the latch reduces to that seen in Figure 13.8(c), just with one pMOS transistor removed for 
compactness from the feedback C?MOS gate. The short-pulse generator for this explicit-pulsed latch is 
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also shown in the figure (it includes a latch-enable input). The deracer circuit, used between two DFFs 
(as in Figure 13.13(b)) to prevent race-through, is also included. 


13.7  Dual-Edge D Flip-Flops 


In modern high-speed systems, a large fraction of the power (~30%) can be dissipated in the clock dis- 
tribution network. In these cases, dual-edge flip-flops can be an interesting alternative because they are 
able to store data at both clock transitions, so the clock frequency can be reduced by a factor of two. 

Two classical implementation techniques for dual-edge DFFs are depicted in Figure 13.20. The first 
technique (called conventional or multiplexer-based) is depicted in Figures 13.20(a)-(b), while the second 
(called pulse-based) is illustrated in Figures 13.20(c)—(e). 

As can be seen in Figure 13.20(a), the multiplexer-based approach consists of two DLs, one positive 
and the other negative, connected to a multiplexer whose function is to connect the DL that is opaque 
to the output (q). Basically any DL can be employed in this kind of circuit. An example, using the TG- 
C?MOS latch of Figure 13.8(c) and the multiplexer of Figure 11.16(b), is shown in Figure 13.20(b). 

Pulse-based dual-edge DFFs are similar to pulse-based single-edge DFFs, with the exception that now 
the short-pulse generator must generate two pulses per clock period. This type of generator normally relies 
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FIGURE 13.20. Positive dual-edge flip-flops: (a) Conventional (mux-based) implementation technique, exem- 
plified in (b) using the TG-C?MOS latch of Figure 13.8(c) and the multiplexer of Figure 11.16(b); (c) Implicit- 
pulsed SRAM-based latch (Figure 13.8(e)), whose pulse generator produces two pulses per clock period; 
(d) DSPFF (dual-edge-triggered static pulsed flip-flop), which also employs the SRAM latch of Figure 13.8(e) 
along with another dual-pulse generator; (e) DESPFF (dual-edge-triggered static pulsed flip-flop), which uses 
the SSTC1 latch of Figure 13.8(j) and the same pulse generator shown in (qd). 
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on an XOR or XNOR gate (as opposed to AND/NAND gates), as previously illustrated in Figure 13.18(c). 
Three examples are presented in Figure 13.20. In Figure 13.20(c), an IP (implicit-pulsed) circuit is shown. 
As can be seen, the latch proper is simply the SRAM cell of Figure 13.8(e) with the switches implemented by 
four nMOS transistors each, which are controlled by several delayed versions of the clock jointly produc- 
ing the desired dual pulse. The case in Figure 13.20(d), named DSPFF (dual-edge-triggered static pulsed 
flip-flop) is again a straightforward application of the SRAM latch of Figure 13.8(e), this time using an 
EP (explicit-pulsed) arrangement. Finally, the DESPFF (dual-edge-triggered static pulsed flip-flop) circuit 
shown in Figure 13.20(e) is a direct application of the SSTC1 latch previously seen in Figure 13.8(j), and 
employs the same dual-pulse generator seen in Figure 13.20(d). 


13.8 Statistically Low-Power D Flip-Flops 


Three main approaches have been taken recently to reduce the power consumption of flip-flops while 
trying to maintain (or even increase) their speeds. They are called energy recovery, clock gating, and con- 
ditional capture. 

The first approach (energy recovery) is based on dual transmission-gates with a pulsed power sup- 
ply [Athas94], though in practice it has contemplated mainly the use a sinusoidal global clock [Voss01, 
Cooke03], which is applied directly to some of the flip-flops studied above (either without any or with 
minor modifications). 

The second approach (clock gating) consists of blocking the clock when q and d are equal because then 
d does not need to be stored. 

Finally, conditional capture prevents certain internal transitions when g and d are equal. 

The last two categories mentioned above belong to a general class called statistical power reduction 
because the power is only reduced if the amount of activity is low (typically below 20 to 30 percent), that 
is, if d does not change frequently. This is due to the fact that, besides the power needed to process q, 
additional power is necessary in these new circuits to feed the additional gates. 

The clock-gating approach is illustrated in Figure 13.21(a). As can be seen, the XOR gate enables the 
AND gate only when d # q. It is important to observe, however, that this approach is not recommended 
for any DFF. For example, for truly static flip-flops (no precharge-evaluate operation) with only one or 
two clocked transistors, the net effect of clocking the actual DFF might be not much different from that 
of clocking the clock-gating AND gate. 


(a) GCFF (b) CCFF (c) DCCER 


FIGURE 13.21. Flip-flops with statistical power reduction: (a) GCFF (gated-clock flip-flop); (b) CCFF (conditional- 
capture flip-flop); DCCER (differential conditional-capture energy-recovery) flip-flop. 
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The conditional-capture approach is depicted in Figure 13.21(b). As in the case above, it too blocks unnec- 
essary internal transitions, but its action does not involve the clock signal. In the case of Figure 13.21(b), 
the first stage is a pulsed latch, while the second is an unclocked SR latch. The circuit operates using the 
precharge-evaluate principle, so two additional nMOS transistors, controlled by NOR gates, are inserted 
between the precharge transistors and the input transistors. This arrangement prevents the set/reset nodes 
from being discharged during evaluation when q=d (that is, in this case d is not captured). 

Another conditional-capture flip-flop is shown in Figure 13.21(c). The first stage is a dynamic implicit- 
pulsed latch that again operates using the precharge-evaluate principle and has its outputs stored by a 
conventional SR latch. As usual, the lower part constitutes the short-pulse generator, while the upper 
part (pMOS transistors) is the precharge transistors (in this case these transistors are permanently ON 
instead of being clocked). Conditional capture is achieved with the use of d in series with q’ and d’ 
in series with g, preventing the set/reset nodes from being discharged when d=g. Energy recovery is 
achieved by using a sinusoidal clock instead of a square-ware clock. 


13.9  D Flip-Flop Control Ports 


Finally, we discuss the introduction of control ports to DFFs, namely for reset, preset, enable, and clear. 


13.9.1 DFF with Reset and Preset 


In many applications it is necessary to reset (that is, force the output to '0') or preset (force the output to '1’) 
the flip-flops. These inputs (reset, preset) can normally be introduced with very simple modifications in 
the original circuit. As an example, Figure 13.22(a) shows the conventional DFF seen in Figure 13.16(c), 
which had its inverters replaced with 2-input NAND gates. It is easy to verify that rst’ ='0' forces q='0', 


FIGURE 13.22. Introduction of flip-flop-control inputs: (a) reset and preset, (b) enable, and (c) clear. 
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whereas pre'='0' forces q='1' (in this example, if rst’ =pre' ='0', then pre’ wins). The corresponding DFF 
symbol is also included in the figure. 


13.9.2 DFF with Enable 


Flip-flops may also need an enable input. When enable is asserted, the DFF operates as usual; when not, 
the circuit retains its state (that is, remains opaque). A basic way of introducing an enable port is by 
means of a multiplexer (Section 11.6), as shown in Figure 13.22(b). The symbol shown on the right cor- 
responds to the flip-flops normally available in CPLD/FPGA chips (Chapter 18), which contain, besides 
the indispensable in/out ports (d, clk, and q), reset, preset, and enable ports. (For another way of intro- 
ducing enable, see Exercises 13.19 and 13.20.) 


13.9.3 DFF with Clear 


In our context, the fundamental difference between reset and clear is that the former is asynchronous (that 
is, does not depend on clk), while the latter is synchronous. The introduction of a flip-flop clear port is 
illustrated in Figure 13.22(c), which consists of simply ANDing d and clr’. When clr’ ='1', the DFF oper- 
ates as usual, but if clr’ ='0', then the output is zeroed at the next rising edge of the clock. 


13.10  TFlip-Flop 


The internal operation of most counters is based on the “toggle” principle. A toggle flip-flop (TFF) is 
simply a circuit that changes its output (from '0' to '1' or vice versa) every time a clock edge (either posi- 
tive or negative, depending on the design) occurs, remaining in that state until another clock edge of the 
same polarity happens. This is illustrated in Figure 13.23 for a positive-edge TFF. Notice in the truth table 
that t is simply a toggle-enable input, so the circuit operates as a TFF when t='1', or it remains in the same 
state (represented by g*=q) when t='0'. 

CPLDs and, specially, FPGAs (Chapter 18) are rich in DFFs, which are used as the basis to implement 
all kinds of flip-flops. Indeed, a DFF can be easily converted into a TFF, with four classical conversion 
schemes depicted in Figure 13.24. 

In Figure 13.24(a), a DFF is converted into a TFF by simply connecting an inverted version of q to d 
(this circuit was seen in Example 4.9). In this case, the flip-flop has no toggle-enable input. 

In Figure 13.24(b), two options with toggle-enable capability are shown. In the upper figure, an 
inverted version of q is connected to d, and the enable (ena) input of the DFF is used to connect the 
toggle-enable signal (assuming that the DFF does have an enable port). In the lower diagram of 
Figure 13.24(b), the DFF has no enable input, hence an XOR gate is needed to introduce the toggle- 
enable signal. In both cases, if toggle_ena='1', then the circuit changes its state every time a positive 
clock edge occurs, or it remains in the same state if toggle_ena='0' (toggle-enable is represented simply 
by t in the TFF symbol shown on the right). Finally, in Figure 13.24(c), the circuit has an additional 


FIGURE 13.23. Symbol and truth table for a positive-edge TFF (t is just a toggle-enable port). 
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FIGURE 13.24. Conversion schemes from DFF to TFF: (a) Without toggle-enable input; (b) With toggle-enable input; 
(c) With toggle-enable and clear inputs. The toggle-enable input is represented by t in the symbols on the right. 


input, called clear (see Figure 13.22(c)); recall also that the fundamental difference between reset and clear 
is that the former is asynchronous while the latter is synchronous. If clear='0', then the output is synchro- 
nously forced to '0' at the next positive transition of the clock. 


MM EXAMPLE 13.5 ASYNCHRONOUS COUNTER 


Figure 13.25(a) shows two negative-edge TFFs connected in series. The clock is applied only to the 
first stage, whose output (go) serves as clock to the second stage. Using the clock waveform as the 
reference, draw the waveforms for the output signals (q9 and q). 


qo qi 


qo 


(a) ) : ~ (b) 


FIGURE 13.25. Asynchronous counter of Example 13.5. 
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SOLUTION 


The solution is depicted in Figure 13.25(b). Because the circuits are negative-edge TFFs, arrows were 
used in the corresponding waveforms to highlight the only points where the flip-flops are transpar- 
ent. The TFFs’ initial state was assumed to be gy=q,='0'. The first stage is similar to that in Example 
4.9 (except for the fact of being now negative-edge), so a similar analysis produces the waveform 
for gg shown in the second plot of Figure 13.25(b), a little delayed with respect to clk. The same can 
be done for the second stage, which has qy as clock, resulting in the waveform shown in the last 
plot, with g, a little delayed with respect to gy. Looking at gy and q,, we observe that this circuit is 
an upward binary counter because the outputs are q4,g)="00" (=0), then "01" (=1), then "10" (=2), 
then "11" (=3), after which it restarts from "00". This circuit is called asynchronous because clk is not 
connected to all flip-flops (counters will be discussed in detail in Chapter 14). As can be observed in 
the plots, a counter is indeed a frequency divider because fj9=f.4./2 and fj =f../4. 


13.11. Exercises 
1. From SR latch to D latch 


Even though froma VLSI design perspective a D latch is rarely implemented with conventional gates 
(see Figures 13.8 and 13.9), playing with gates is useful to master logic analysis. In this sense, modify 
the SR latch of Figure 13.3(a) to convert it into a D latch (see the truth table in Figure 13.4(a)). 


2. DL timing analysis 


Figure E13.2 shows a DL to which the signals clk and d also shown in the figure are applied. 
Assuming that the DL’s propagation delays (see Figure 13.6) are t,cg=2ns and t,p9=1ns, draw 
the waveform for q. Assume that the initial value of g is '0' and that the clock period is 10ns. 


clk 


FIGURE E13.2. 


3. Bad circuit 


Figure E13.3 shows a D latch with q’ connected to d. Try to draw d and q and explain why this circuit 
is unstable. 


q 
i BES sans athieatsnames Ete AMESIEA Ses 
clk 


FIGURE E13.3. 
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4. SR flip-flop 


Figure E13.4 shows the symbol for a positive-edge set-reset flip-flop (SRFF), along with its truth 
table. It also shows a possible implementation, derived from a DFE, which has a section of com- 
binational logic (marked with a question mark—notice that it is not clocked). The purpose of this 
exercise is to design that section. 


cr sr qr = State 
tT cxra 0 Hold 
| t 10 | 1 


V 


ae Ez 


1 


FIGURE E13.4. 


a. Based on the truth table of Figure E13.4, briefly explain how the SRFF works. 
b. Draw a corresponding circuit. 

5. JK flip-flop 
This exercise is similar to that above, but it deals with a JKFF instead of a SRFF. 


Figure E13.5 shows the symbol for a positive-edge JK flip-flop (JKFF), along with its truth table. 
It also shows a circuit for a JKFF, derived from a DFF, which has a section of combinational logic 


(marked with a question mark) that we want to design. 


FIGURE E13.5. 


a. Based on the truth table of Figure E13.5, briefly explain how a JK flip-flop works. 

b. Draw a corresponding circuit. 
6. From DFF to DL 

Consider the “downward” conversion of a D flip-flop into a D latch. Can it be done? Explain. 
7. TFF timing analysis 


Figure E13.7 shows a DFF operating as a TFF. Assuming that the propagation delays are t,-g=2ns 
(for the DFF) and t, ;,,=1ns (for the inverter), draw the waveforms for d and g. Assume that the 
initial value of q is '0' and that the clock period is 16ns. Adopt the simplified timing diagram style 
of Figure 4.8(b). 
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FIGURE E13.7. 


8. DFF functional analysis 


Figure E13.8 shows a positive-edge DFF that receives the signals clk, rst, and d included in the 
figure. Assuming that the propagation delays are negligible, draw the waveform for q. Assume that 
initially q='0'. 


clk 
d d q q 
rst 
ck—> st d 
rst q 


FIGURE E13.8. 


9. DFF timing analysis 


Figure E13.9 shows the same DFF and input signals seen in the previous exercise. Assuming that the 
DFF’s propagation delay from clk to q is t,cg=5ns and from rst to gis also t,rg=5ns, draw the result- 
ing waveform for q. Consider that the clock period is 50ns and that the DFF’s initial state is q ='0'. 


clk 
d d q q 
rst 
clk—> fst d 
rst q 
FIGURE E13.9. 


10. Modified DFF/TFF 


This exercise relates to the circuit shown in Figure E13.10. 


A B 


SgEGUgE 


FIGURE E13.10. 
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11. 


12. 


13. 


14. 


15. 


16. 


a. With the switch in position A, draw the waveforms at points q, d, and x. Use the clock as 
reference. What kind of circuit is this? Is it a frequency divider? 


b. Repeat the exercise above but now with the switch in B. Analyze the resulting waveforms. Is it 
a frequency divider? 


Master-slave DFFs 


a. Draw the schematics for a negative-edge DFF implemented according to the master-slave 
approach using the dynamic latch of Figure 13.9(a) for the master and the static latch of 
Figure 13.8(b) for the slave. 


b. Repeat the exercise above using the static latch of Figure 13.8(f) in both stages. 
Implicit-pulsed DFF 


Figure 13.20(e) shows a dual-edge flip-flop that uses an EP (explicit-pulsed) arrangement to clock 
the circuit. Using the same latch (SSTC1), convert it into an IP (intrinsic-pulsed) single-edge DFF. 
(Suggestion: examine the lower part of the pulse-based DFF in Figure 13.19(c) and consider the 
possibility of including just two transistors, like M2—M3, to implement the built-in short-pulse 
generator.) 


Precharge-evaluate flip-flops 


a. Which among all (master-slave) DFFs in Figure 13.17 operate using the precharge-evaluate 
principle? 


b. Which among all (pulse-based) DFFs in Figure 13.19 operate using the precharge-evaluate 
principle? 
Dual-edge DFF with multiplexers 


The dual-edge DFF of Figure 13.20(a) was implemented with two DLs and a multiplexer. How can 
a similar flip-flop be constructed using only multiplexers? Draw the corresponding circuit. What is 
the minimum number of multiplexers needed? 


Dual-edge DFF with single-edge DFFs 


The dual-edge DFF of Figure 13.20(a) was implemented with two D latches and a multiplexer. Sup- 
pose that the DLs are replaced with single-edge DFFs. Analyze the operation of this circuit. Is it still 
a dual-edge DFF? (Hint: Think about glitches.) 


Dual-edge TFF 


Figure E13.16 shows a dual-edge DFF configured as a TFF. Given the waveform for clk, draw the 
waveform at the output (q). Comment on the usefulness of this circuit. 


OX 
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FIGURE E13.16. 
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17. DL with reset, clear, and preset 


Figure E13.17 shows the same positive-level DL seen in Figure 13.8(b). 


clk’ 


ie 
clk —> 


clk’ 


FIGURE E13.17. 


a. Make the modifications needed to introduce a reset port (active low, as depicted in the symbol 
on the right of Figure E13.17). Recall that in our context reset is asynchronous, while clear is 
synchronous. 


b. Suppose that instead of a reset port we want a flip-flop clear port (active low). Make the modi- 
fications needed to introduce it. 


c. Finally, make the modifications assuming that the desired port is a preset port (output forced to 
'1' asynchronously, again active low). 


18. DFF with clear 


Figure E13.18 shows a DFF implemented using the master-slave approach in which the DL of 
Figure 13.8(b) is employed in each stage. 


a. Is this a positive- or negative-edge DFF? 


b. Given the waveforms for clk and d depicted in the figure, draw the waveform for q (assume that 
the initial state of q is '0' and that the propagation delays are negligible). 


c. Make the modifications needed to include in this circuit a flip-flop clear input (recall again that 
in our context reset is asynchronous, while clear is synchronous, so the flip-flop should only be 
cleared at the proper clock edge). 


| clk’ master 


clk slave 
| clk 


FIGURE E13.18. 


19. DFF with enable #1 


A classical way of introducing an enable port into a DFF was shown in Figure 13.22(b), which was 
repeated in Figure E13.19 below. 
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FIGURE E13.19. 


a. Given the waveforms for clk, d, and ena, draw the waveform for g (assume that its initial state is 
'0' and that the propagation delays are negligible). 


b. Inspect the result and verify whether this enable arrangement is truly synchronous (that is, 
whether it becomes effective only at the rising edge of the clock). 


20. DFF with enable #2 


Another alternative to introduce an enable port is depicted in Figure E13.20, which shows an AND 
gate processing clk and ena, with the purpose of preventing clk from reaching the DFF when ena ='0'. 
This option, however, interferes with the clock, which is generally not recommended. It also exhibits 
another important difference (defect) with respect to the option depicted in Figure 13.22(b). 


FIGURE E13.20. 


a. Given the waveforms for clk, ena, and d, draw the waveform for g (notice that these waveforms 
are the same as those in the previous exercise). To ease your task, draw first the waveform for 
clk*. Assume that the initial state of q is '0' and that the propagation delays are negligible. 


b. Inspect the result (7) and compare it with that obtained in the previous exercise. Is there any dif- 
ference? Why? 


13.12 Exercises with SPICE 


See Chapter 25, Section 25.15. 


Sequential Circuits 


Objective: Sequential circuits are studied in Chapters 13 to 15. In Chapter 13, the fundamental 
building blocks (latches and flip-flops) were introduced and discussed at length. We turn now to the 
study of circuits that employ such building blocks. The discussion starts with shift registers, followed by 
the most fundamental type of sequential circuit, counters, and then several shift-register and/or counter- 
based circuits, namely signal generators, frequency dividers, prescalers, PLLs, pseudo-random sequence genera- 
tors, and data scramblers. This type of design will be further illustrated using VHDL in Chapter 22. 


Chapter Contents 


14.1 Shift Registers 

14.2. Synchronous Counters 

14.3. Asynchronous Counters 

14.4 Signal Generators 

14.5 Frequency Dividers 

14.6 PLLand Prescalers 

14.7. Pseudo-Random Sequence Generators 
14.8 Scramblers and Descramblers 
14.9 Exercises 

14.10 Exercises with VHDL 

14.11 Exercises with SPICE 


14.1 Shift Registers 


Shift registers (SRs) are very simple circuits used for storing and manipulating data. As illustrated 
in Figure 14.1, they consist of one or more strings of serially connected D-type flip-flops (DFFs). In 
Figure 14.1(a), a four-stage single-bit SR is shown, while in Figure 14.1(b) a four-stage N-bit SR is 
depicted. 

The operation of an SR is very simple: Each time a positive (or negative, depending on the DFF) clock 
transition occurs, the data vector advances one position. Hence in the case of Figure 14.1 (four stages), 
each input bit (d) reaches the output (q) after four positive clock edges have occurred. 

Four applications of SRs are illustrated in Figure 14.2. In Figure 14.2(a), the SR operates as a serial- 
in parallel-out (SIPO) memory. In Figure 14.2(b), an SR with a programmable initial state is presented 
(load ='1' causes X= XX1XpXx3 to be loaded into the flip-flops at the next positive edge of clk, while load ='0' 
causes the circuit to operate as a regular SR). This circuit also operates as a parallel-in serial-out (PISO) 
memoty. In Figure 14.2(c), a circular SR is depicted. Note that rst is connected to rst of all DFFs except 
for the last, which has rst connected to its preset (pre) input. Therefore, the rotating sequence is fixed and 
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FIGURE 14.1. (a) Single-bit and (b) multibit shift registers. 
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FIGURE 14.2. Applications of SRs: (a) Serial-in parallel-out (SIPO) memory; (b) SR with load capability or 
parallel-in serial-out (PISO) memory; (c) Circular SR with a fixed rotating sequence ("0001"); (d) Tapped delay line. 
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composed of three '0's and one '1'. Finally, in Figure 14.2(d), a programmable-block SR, known as tapped 
delay line, is shown. It consists of several SR blocks with sizes that are powers of 2 interconnected by 
means of multiplexers (Section 11.6). By setting the selection bits, sel(2:0), properly, a delay varying from 
0 to 7 clock periods can be obtained (see the accompanying truth table). 


14.2 Synchronous Counters 


Counters are at the heart of many (or most) sequential systems, so a good understanding of their physical 
structures is indispensable. To achieve that purpose, an extensive analysis of internal details is presented 
in this chapter, along with numerous design considerations. The design of counters will be further illus- 
trated in Chapter 15 employing the finite-state-machine concept. Practical designs using VHDL will also 
be shown in Chapter 22. 

Counters can be divided into synchronous and asynchronous. In the former, the clock signal is connected 
to the clock input of all flip-flops, whereas in the latter the output of one flip-flop serves as clock to 
the next. 

They can also be divided into full-scale and partial-scale counters. The former is modulo-2 because it 
has 2‘ states (where N is the number of flip-flops, hence the number of bits), thus spanning the complete 
N-dimensional binary space. The latter is modulo-M, where M<2N, thus spanning only part (M states) 
of the corresponding binary space. For example, a 4-bit counter counting from 0 to 15 is a full-scale 
(modulo-16) circuit, while a BCD (binary-coded decimal) counter (4-bits, counting from 0 to 9) is a 
partial-scale (modulo-10) counter. Synchronous modulo-2" and modulo-M counters are studied in this 
section, while their asynchronous counterparts are seen in the next. The following six cases will be 
described here: 


Case 1: TFF-based synchronous modulo-2" counters 
Case 2: DFF-based synchronous modulo-2% counters 
Case 3: TFF-based synchronous modulo-M counters 
Case 4: DFF-based synchronous modulo-M counters 
Case 5: Counters with nonzero initial state 

Case 6: Large synchronous counters 


Case 1 TFF-based synchronous modulo-2” counters 


A synchronous counter has the clock signal applied directly to the clock input of all flip-flops. Two 
classical circuits of this type are shown in Figure 14.3. All flip-flops are TFFs (Section 13.10) with a toggle- 
enable port (t). Therefore, their internal structure is that depicted in Figure 13.24(b) or similar. The t 
input of each stage is controlled by the outputs of all preceding stages, that is, t,=4q,_1--- 4449, with q 
representing the counter’s LSB (least significant bit). If four stages are employed, then the output is 
93929190 = {"0000" — "0001" — "0010" > "0011" > ... > "1111" > "0000" > ...}, which constitutes a binary 
0-to-15 (modulo-16) counter. 

The counter of Figure 14.3(a) has the toggle-enable signals (f) computed locally at each stage, so it is 
often referred to as synchronous counter with parallel enable. The counter of Figure 14.3(b), on the other 
hand, has the toggle-enable signals computed serially, thus known as synchronous counter with serial 
enable. The former is faster, but the modularity of the latter (one TFF plus one AND gate per cell) results 
in less silicon space and generally also less power consumption. 
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FIGURE 14.3. Synchronous modulo-2” counters with (a) parallel enable and (b) serial enable. All flip-flops are 
positive-edge TFFs, where t is a toggle-enable port (Figure 13.24(b)). A partial timing diagram is shown in (c). 


A partial timing diagram is shown in Figure 14.3(c). It depicts the behavior of the first three stages in 
Figure 14.3(a), which produce the vector 99,49, thus counting from 0 ("000") to 7 ("111"). Looking at the 
waveforms for q>, 4, and gg (in that order), we observe the following sequence: "000" — "001" — "010" 
— "011"... etc. In the first stage of Figure 14.3(a), tp is permanently at '1', so the first TFF toggles every 
time a positive clock transition occurs (highlighted by arrows in the clock waveform of Figure 14.3(c)), 
producing q) delayed by one TFF-delay with respect to the clock. In the second stage, t, =qg, causing it to 
toggle once every two positive clock transitions, producing q, also one TFF-delay behind clk. In the third 
stage, ty=q 9°41, So it toggles once every four positive clock edges. Note that even though f, takes one 
AND-delay plus one TFF-delay to settle, g, is only one TFF-delay behind clk. Finally, in the fourth stage, 
to=4o*41°4n, SO it toggles once every eight positive clock edges. Even though t, takes one AND-delay 
plus one TFF-delay to settle, again q3 is just one TFF-delay behind clk. 

Note in the description above that the delay needed to produce the toggle-enable signals (t,, k=2,3,...) 
is larger than that to produce the flip-flop outputs (q,, k=1, 2,...), so the former is the determining factor 
to the counter’s maximum speed. Moreover, even though the parallel-enable approach of Figure 14.3(a) is 
slightly faster than the serial-enable circuit of Figure 14.3(b), recall that a gate’s delay grows with its fan-in 
(number of inputs), so the advantage of the former over the latter is limited to blocks of four or so bits, 
after which their speeds become comparable (but recall that the latter occupies less space and consumes 
less power). 

To understand why a TFF-based counter (Figure 14.3) needs an AND gate connected to each t 
input with this gate receiving all preceding flip-flop outputs, let us examine Figure 14.4, where 
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FIGURE 14.4. Generation of toggle-enable signals. 


the first column contains the desired output values (a binary counter). Inspecting that column, we 
verify that each output (4, q,, ...) must change its value if and only if all preceding outputs are 
high. Therefore, because we must produce t='l' for the TFF to toggle, ANDing the outputs of all 
preceding stages is the proper solution. This was highlighted with rectangles in the q, column (see 
that q, changes only when q, and q, are both '1’), but it can be verified also in the other columns. In 
the particular case of q,, because we want it to change at every clock edge (of the proper polarity), 
ty must be permanently '1'. The resulting toggle-enable signals are depicted in the third column of 
Figure 14.4. 


Case 2 DFF-based synchronous modulo-2” counters 


Figure 14.5 shows the implementation of synchronous modulo-2N counters using DFFs instead of TFFs. 
The circuit in Figure 14.5(a) is equivalent to that in Figure 14.3(a), which operates with a parallel enable, 
while that in Figure 14.5(b) is similar to that in Figure 14.3(b), operating with a serial enable. In either 
case, instead of generating toggle-enable inputs, the circuit must generate data (do, d,, ...) inputs (notice, 
however, that the portions within dark boxes resemble TFFs (see Figure 13.24)). 

To understand the equation for d, let us examine the table in Figure 14.5(c), which shows the coun- 
ter’s outputs for the first three stages. The first column presents the desired output values. Recall that, 
because the flip-flops are D type, the output simply copies the input at the respective clock transition; 
therefore, the values that must be provided for d are simply the system’s next state. In other words, 
when the output (...45414) is m (where m is a decimal value in the range 0<m=<2-1), the next state 
(...dyd,d) must be m+1 (except when m=2N-1, because then the next state must be zero). This can be 
observed in the rightmost column of Figure 14.5(c). To attain these values for d, the table shows that d 
must change to '1' whenever its present value is '0' and all preceding bits are '1's, and it also shows that 
d must remain in '1' while the preceding bits are not all '1's. For example, for d, this can be translated as 
dy=Qy' * (91°90) + 92° (91° 40)’, which is the same as d)=9,@ (q,°q). This is precisely what the circuits of 
Figures 14.5(a) and (b) do. 

Another DFF-based synchronous full-scale counter is illustrated in Figure 14.6. Although this 
kind of implementation is not common, the exam of its architecture is a good exercise. An adder 
(Sections 12.2 and 12.3) or an incrementer (Section 12.6) is employed along with regular DFFPs. If an 
adder is used, then the DFF outputs (q, q;, ...) are connected to one of its inputs, while 1 (="0...001") 
is applied to the other input, with the sum fed back to the flip-flops, causing the circuit to increment its 
output by one unit every time a positive clock edge occurs. A similar reasoning holds for the case when 
an incrementer is used. 
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FIGURE 14.5. Synchronous modulo-2™ counters implemented using regular DFFs instead of TFFs: (a) With 
parallel enable; (b) With serial enable (the portions within dark boxes implement TFFs); (c) Generation of the 
DFF inputs (dp, di, ...). 
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FIGURE 14.6. Another synchronous modulo-2” counter, which employs an adder/incrementer and DFFs. 


Case 3 TFF-based synchronous modulo-/M counters 


All counters described above are full-scale counters because they span the whole binary space (2 states, 
where N is the number of flip-flops, hence the number of bits). For a modulo-M counter (where M< 2"), 
some type of comparator must be included in the system, such that upon reaching the desired final value 
the system can return to and continue from its initial state. 
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FIGURE 14.7. Incorrect modulo-M counter implementation. 


To construct such a circuit, one might be tempted to use a system like that in Figure 14.7 witha full-scale 
counter feeding one input of the comparator and a reference value (for reset) feeding the other. In this 
case, the comparator would be able to generate a reset signal whenever the reference value is reached. 

There are, however, two major flaws in this approach. When the counter’s output changes from one 
state to another, it might very briefly go through several states before settling into its definite state. 
This is particularly noticeable when the MSB changes because then all bits change (for example, 
"0111" — "1000"). Consequently, it is possible that one of these temporary states coincides with the refer- 
ence value, which would inappropriately reset the counter (recall that reset is an asynchronous input). 
The other problem is that upon reaching the reference value the counter is immediately reset, implying 
that the reference value would have to be “final value +1,” in which state the counter would remain for a 
very brief moment, thus causing a glitch at the output. For these reasons, a different approach is needed, 
which consists of manipulating the main input (that is, t when using TFFs or d when using DFFs) without 
ever touching the reset input. This approach is summarized below. 

To construct a modulo-M synchronous counter with TFFs, any of the circuits shown in Figure 14.3 can 
be used but with some kind of toggling-control mechanism added to it. Let us start by considering the 
most common case, in which the TFFs are plain TFFs, as in Figure 13.24(b). And suppose, for example, 
that our circuit must be a 0-to-9 counter. Then the situation is the following: 


m@ Desired final value: 43429,9)="1001" (=9) 

m@ Natural next value: 939291qG)="1010" (=10; this is where the counter would go) 

m@ Desired next (initial) value: 43429199 = "0000" (=0; this is where the counter must go) 
In this list, we observe the following: 


@ For q3: It is high in the final value and would continue high in the next state. Because we want it to 
be '0', this flip-flop must be forced to toggle (t3;='1') when the final value is reached (that is, when 
93=99='1'; note that the '0's need no monitoring because there is no prior state in which q3=q)='1' 
occurs in a sequential binary counter—this, however, would not be the case in a counter with Gray 
outputs, for example). 


m@ For g,: The value in the next state coincides with the desired initial value so nothing needs to be done. 


@ For qj: It is '0' in the final value and would change to '1' in the next state. Because we want it to be 
'0', this flip-flop must be prevented from toggling (t,='0') when q3=q )='1' occurs. 


m@ For qo: The value in the next state coincides with the desired initial value, so nothing needs to 
be done. 


The construction of this toggling-control mechanism is illustrated in the example below. However, itis impor- 
tant to observe that this procedure, though very simple and practical, does not guarantee that the Boolean 
expressions are irreducible (though they are irreducible in the light of the information that is available at 
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this point). The reason is very simple: When the counter is a partial-scale one (like the 0-to-9 counter above), 
the method does not take advantage of the states that no longer occur (from 10 to 15), which might lead to 
smaller expressions (this will be demonstrated in Chapter 15, Example 15.4, using a finite state machine). 


MM EXAMPLE 14.1 SYNCHRONOUS 0-TO-9 COUNTER WITH REGULAR TFFS 


Design a synchronous 0-to-9 counter using regular TFFs. 


SOLUTION 


In this case, the situation is precisely that described above, that is: 
Desired final value: 43429,9)="1001" (=9) 

Natural next value: 93929q)="1010" (= 10) 

Desired next value: 4392414) ="0000" (=0) 


Therefore, q; must be forced to toggle (thus t3='1') while q, must be forced not to toggle (hence t, ='0') 
when 43=49)='1' occurs. In Boolean terms, fey =fojq When condition = FALSE or t,,.,='0' or '1' (depend- 
ing on the case) when condition = TRUE, where condition = q3- qo. The following expressions then result 
for t, and t,, with t3,g and ty,g taken from the original modulo-2% circuit (Figure 14.8(a)): 


tanew = tgo1g* Condition’ +'1' - condition =t3,\4+ condition (simplified using the absorption theorem) 


Hence tanew = tso1a + 93° 4o= 92° 91° 40+ 93" Jo: 
tinew = Hog: condition’ +'0'- condition = t,,)4° condition’ 
Hence tinew = trod * 93°40)’ = 40 (93° 40)’ =93" “o: 

The other two flip-flops (qz and gg) need no modifications. The complete circuit is shown in 
Figure 14.8. In (a), the original (unchanged) modulo-2’ circuit, borrowed from figure 14.3(a), is 
presented, where the TFF is any of those in figure 13.24(b) or equivalent. In (b), the modifications 
needed to cause ¢,='0' and t3='1' (expressions above) when 4392919)="1001" (=9) are depicted. The 
final circuit, with these modifications included, is shown (c). 


(a) 


clk. 


FIGURE 14.8. Synchronous 0-to-9 counter using regular TFFs (Example 14.1): (a) Original modulo-2" counter 
(N=4); (b) Modifications needed to force t,;='0' and t3='1' when q3q9291q9= "1001 "(=9); (c) Final circuit. 
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EXAMPLE 14.2 SYNCHRONOUS 0-TO-9 COUNTER USING TFFS WITH CLEAR 
a. Design a synchronous 0-to-9 counter using TFFs with a flip-flop clear port (Figure 13.24(c)). 


b. Discuss the advantages of this approach over that in Example 14.1. 


SOLUTION 


Part (a): 
The general situation here is the same as in Example 14.1, that is: 


Desired final value: 4344,q4)="1001" (=9) 
Natural next value: 939244 )="1010" (=10) 
Desired next (initial) value: 4349199 = "0000" (=0) 


Because clear is a synchronous input, all that is needed is a gate that produces clear ='0' when the 
desired final value is reached (that is, when q3=4q)='1'). This is illustrated in Figure 14.9(a). Because 
two of the flip-flops (qj and q,) need no modifications, the dashed lines connecting their clr’ inputs 
to the NAND gate are optional. Notice also that only gg and qg3 need to be connected to the NAND 
gate because in a sequential binary counter no other state previous to 9 (="1001") presents the same 
pattern of '1's. 


Part (b): 

The advantage of this architecture is its versatility. The same circuit can be used for any value of M. 
This is illustrated in Figure 14.9(b), which shows a 0-to-y counter, where y= 3 Y; Yo is the minterm 
corresponding to the desired final value (for example, 3/9 =49392'91'qq when the final value is 
y="1001"=9). 


FIGURE 14.9. (a) Synchronous 0-to-9 counter using TFFs with clear (Example 14.2); (b) Programmable version 
(0-to-y counter), where y3/>¥Yo is the minterm corresponding to the desired final value (for example, if y=9= 
"1001", then y3¥2ViYo=9392'91'Qo)- al 
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Case 4 DFF-based synchronous modulo-M counters 


To construct a modulo-M synchronous counter with DFFs instead of TFFs, any of the circuits of Figure 14.5 
can be used but with some kind of d-control mechanism added to it. Let us assume the most general case 
in which the DFFs are just plain D-type flip-flops with no special clear or enable ports. And, as an example, 
let us consider again the 0-to-9 counter discussed above. The situation is then the following: 

Desired final value: q34q,4)="1001" (=9) 

Natural next value: 93929)99="1010" (=10; this is where the counter would go) 


Desired next (initial) value: 43429199 = "0000" (=0; this is where the counter must go) 
In this list, we observe the following: 


For q3: Its next value is '1', but '0' is wanted, so its input must be forced to be d,='0' when the final 
value is reached (that is, when q3=q,)='1'; note that again the '0's need no monitoring because there is 
no prior state in which q3=q)='l' occurs). 


For q,: Its next value is '0', which coincides with the desired value, so nothing needs to be done. 

For q: Its next value is '1', but '0' is wanted. Thus its input must be forced to be d,='0' when g3=q)='1' occurs. 

For qo: Its next value is '0', which coincides with the desired value, so nothing needs to be done. 
The construction of this input-control mechanism is illustrated in the example below. However, similarly 
to the previous case, it is important to observe that this procedure, though very simple and practical, 
does not lead necessarily to irreducible Boolean expressions. This is because it does not take advantage 


of the of the states that cannot occur (from 10 to 15), which could lead to smaller expressions (this will be 
demonstrated in Chapter 15, Example 15.4, using the finite-state-machine approach). 


MM EXAMPLE 14.3 SYNCHRONOUS 0-TO-9 COUNTER WITH REGULAR DFFS 


Design a synchronous 0-to-9 counter using regular DFFs. 


SOLUTION 


From the description above we know that d3=d,='0' must be produced when q3=q9='1'. In Boolean 


terms, drew =4oiq When condition = FALSE or d,.y='0' when condition = TRUE, where condition =q3- qo. 


The following expressions then result for d3 and d,, with the values of d3,)g and d1,)4 picked from the 
original modulo-2N circuit (Figure 14.10(a)): 


Aznew =A391q° condition’ +'0' - condition = d3,)4° condition’ 


Hence dsnew =A3o1a * (93° 40)’ =[43 DB (92° 91° 40) 93° 490)" =93°90' +93" 92°91 Fo 


Ainew = A114 ° condition’ +'0' - condition =d,,\4° condition’ 


Hence dinew =Ao1a* (43° 90)’ = (91 © Jo) * (93° 90)’ =91° 90’ + 93" ° 91" * Jo 


The complete circuit, with these modifications included, is presented in Figure 14.10(c). 
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(a) 


qs; qs’ 

' Q2 
q 
qo d; qs ds 

qo 
1 q 
Qo’ q 
(b) 


FIGURE 14.10. Synchronous 0-to-9 counter using regular DFFs (Example 14.3): (a) Original modulo-2” counter 
(N=4); (b) Modifications needed to force d, ='0' and d3='0' when q3q291,49= "1001" (=9); (c) Final circuit. 


EXAMPLE 14.4 SYNCHRONOUS 0-TO-9 COUNTER USING DFFS WITH CLEAR 


a. Design a synchronous 0-to-9 counter using DFFs with a synchronous flip-flop clear port 
(Figure 13.22(c)). 


b. Discuss the advantages of this approach over that in Example 14.3. 


SOLUTION 


Part (a): 

Figure 14.11(a) shows the same modulo-2* (0-to-15) counter of Figure 14.5(a) in which the 
DFFs do not exhibit a flip-flop clear input. We know that for this circuit to count only up to 9, 
some kind of zeroing mechanism must be included. To do so, in Figure 14.11(b) an AND 
gate was introduced between the XOR output and the DFF input, which allows the DFFs to 
be synchronously cleared when the other input to the AND gate is clear='0'. From the previ- 
ous example we know that for the counter to count from 0 to 9, d,='0' and d3='0' must be 
produced when q3=99='l'. This, however, does not mean that dy and dz cannot be zeroed as 
well. Consequently, the general architecture of Figure 14.11(b) results, which is a programmable 
0-to-y counter, where 3/19 is again the minterm corresponding to the desired final value, y. 
Because in the present example it must count from 0 to 9, ¥3Y2Y/ Yo = 9392'91'qo should be employed 
because y=9="1001". 
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Yo (b) 


FIGURE 14.11. (a) Synchronous 0-to-15 (modulo-2”) counter using regular DFFs; (b) Programmable version 
(0-to-y counter), where y3y2Y,Yo is the minterm corresponding to the desired final value (for example, 
V3V2V1 Yo = 9392'91'dg when y=9="1001"). 


Part (b): 
The advantage of this architecture is its versatility. Like that in Example 14.2, by simply changing the 
connections to ¥3Y>Y,Y9 any 4-bit 0-to-y counter can be obtained. Ml 


Case 5 Counters with nonzero initial state 


In the discussions above, all modulo-M counters were from 0 to M-—1. We will consider now the case 
when the initial state (1m) is not zero, that is, the circuit counts from m to M+m-1 (m>0). Though the 
general design procedure is still the same as above, the following two situations must be considered: 


a. The number of flip-flops used in the implementation is minimal. 


b. The number of flip-flops is not minimal (for example, it could be the same number as if it were a 
0-to-m+M-1 counter). 


Suppose, for example, that we need to design a 3-to-9 counter. This circuit has only M=7 states, 
so|log,M|=3 flip-flops suffice. However, we can also implement it as if it were a 0-to-9 counter, in 
which case | log,(m+M) |=4 flip-flops are needed. The advantage of (a) is that it requires less flip- 
flops, but it also requires additional combinational logic to convert the counter’s output (3 bits) 
into the actual circuit output (4 bits), which also causes additional time delay. In summary, in high- 
performance designs, particularly when there is an abundance of flip-flops (like in FPGAs, Chapter 
18), saving flip-flops is not necessarily a good idea. Both situations are depicted in the examples that 
follow. 


14.2. Synchronous Counters 365 


MM EXAMPLE 14.5 SYNCHRONOUS 3-TO-9 COUNTER WITH FOUR DFFS 


Design a synchronous 3-to-9 counter using four regular DFFs. 


SOLUTION 
The situation is now the following: 


Desired final value: 43424,q)="1001" (=9) 
Natural next value: 9392914 )="1010" (=10; this is where the counter would go) 
Desired next value: 4392419) ="0011" (=3; this is where the counter must go) 


In this list, we observe the following: 


For q3: Its next value is '1', but '0' is wanted. Thus its input must be forced to be d3='0' when 
93=40= 1. 

For qz: Its next value is '0', which coincides with the desired value, so nothing needs to be done. 

For q;: Its next value is '1', which coincides with the desired value, so nothing needs to be done. 

For qo: Its next value is '0', but 'l' is wanted. Thus its input must be forced to be dy)='1' when 
93=90='1. 

From the description above we know that d3='0' and d)='1' must be produced when q3=g9='l'. 
In Boolean terms, d3ney=43o1qg When condition =FALSE or d3ney='0' when condition =TRUE, where 
condition =q3°qo; likewise, donew =Aooig When condition = FALSE or dopey ='l' when condition =TRUE. 
The following expressions then result for d3 and do, with the values of d3,)g and do,jq picked from the 
original modulo-2N circuit (Figure 14.12(a)): 

Aznow =A3o1q° condition’ +'0' - condition = d3,)4° condition’ 

Hence d3new= seid * (93° 90)’ = [43 © (92° 1° 40)] (43° 90)’ = 93° F0' +93" * 92°." 40 

donew = Apoig condition’ +'1'- condition 

Hence donew= dota * (93° 4o)' + 93° 40=40' * (43° Fo)’ +93" Fo=40' +43 (simplified using the absorption theorem) 


The complete circuit, with these modifications included, is presented in Figure 14.12(b). 


FIGURE 14.12. Synchronous 3-to-9 counter using four regular DFFs (Example 14.5): (a) Original modulo-2" 
counter (N=4); (b) Final circuit, with modifications introduced in dy and d3 (gray areas). 
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EXAMPLE 14.6 SYNCHRONOUS 3-TO-9 COUNTER WITH THREE DFFS 


Design a synchronous 3-to-9 counter using the minimum number of DFFs (that is, three). 


SOLUTION 

We need to design a 3-bit counter with M=7 states then convert its 3-bit output to the desired 4-bit 
output. A regular 0-to-6 counter was chosen, thus resulting: 

Desired final value: qq,q)="110" (=6) 

Natural next value: 999,9)="111" (=7; this is where the counter would go) 

Desired initial value: q5q,4)="000" (=0; this is where the counter must go) 


Counter Actual output 
2 Gs Go Xa X2 Xi Xo 


[ooo | 0011 | |O] o|o |fa]fo | 
(d) [a] o | o [xis 
qo 
Qo qe’ qo 
(e) qo’ —— Xo Ay X ssh X i X3 
qo qe qa: 
q: qo’ qe 


FIGURE 14.13. Synchronous 3-to-9 counter using the minimum number of DFFs (Example 14.6): (a) Original 
modulo-2" counter (N=3); (b) Modifications needed to turn it into a 0-to-6 counter; (c) Counter with modifica- 
tions included (0-to-6 counter); (d) Truth table and Karnaugh maps to convert the 3-bit counter output into 
the desired 4-bit output; (e) Conversion circuit. 
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In this list, we observe the following: 

For qp: Its next value is '1', but '0' is wanted. Thus its input must be forced to be d,='0' when the final 
value is reached (that is, when q,=q,='1)). 

For g,: Same as above, so d,='0' is needed when q=q,='1'. 

For qo: Same as above, so dy='0' is needed when q,=9,='1'. 


In summary, d,=d,=d)='0' must be produced when q,q,9)="110". In Boolean terms, drew =Aoi4 


7 “new 
when condition=FALSE or d,,.='0' when condition =TRUE, where condition =q,-q,. The following 
expressions then result for d,, d,, and dy, with the values of dooig, 4114, ANA dooiq picked from the origi- 


nal modulo-2N circuit (Figure 14.13(a)): 

donew=Ar91q° condition’ +'0' - condition = d>,)4°condition' 

Hence donew = Azo14 * (92° 91)’ = 192 (91° od] * 92° 91)! = 92° 91' + 92" * 91° Fo 
Ai new = 41014 condition’ +'0' - condition = d,,)4 condition’ 

Hence dinew =o * (42° 91)’ = (91 Bo) * 92° 91)! =91' * Fo+ 92" * 91° Fo! 
donew =Ago1q* condition’ +'0' - condition = do,)4° condition' 

Hence donew =Aooia * 92°91)’ =40' * (42°91) =92" 90’ + 91" * 90" 


This design is illustrated in Figure 14.13. In (a), the original modulo-8 counter is depicted (extracted 
from Figure 14.5). In (b), the modifications (equations above) needed for it to count from 0 to 6 are 
shown. In (c), the resulting counter circuit is presented. In (d), the truth table and corresponding 
Karnaugh maps for the conversion of the three counter outputs (qo, 41, q2) into the desired four 
outputs (X9, x1, X2, X3) are shown, from which we obtain: 


cs dl Pa a Po 
=o! *41+92' “Got 42°41" 40" 
X1=491°90+91' “40 
Xo =4o. 
The resulting circuit presented in (e). This is a good example where the minimization of the 


number of flip-flops is not advantageous because to save one DFF substantial additional logic was 
needed, which also causes the counter to be slower and possibly to also consume more power. Mf 


Case 6 Large synchronous counters 
The two main ways of constructing a large synchronous counter are the following: 


m@ Witha serial enable structure: In this case, either the circuit of Figure 14.3(b) or that of Figure 14.5(b) 
can be used depending on whether TFFs or DFFs are going to be employed, respectively. Both of 
these structures utilize a standard cell that is not affected by the counter’s size. 


m With a mixed parallel enable plus serial enable structure: In this case, several blocks are associated 
in series, each containing a counter with parallel enable (typically with four or so stages) like that 
in Figure 14.3(a) or 14.5(a), with such blocks interconnected using a serial enable. This approach, 
illustrated in Figure 14.14, is slightly faster than that in (a), but the circuit consumes a little more 
silicon space. Note that an additional wire is needed for interstage transmission of the serial 
enable signal (Ty— Toyz). Recall also that the gate delay grows with the fan-in, and note that the 
fan-in of the last gate in Figure 14.14(a) is already 5. 
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clk clk clk 


clk 
(b) 


FIGURE 14.14. Construction of large synchronous counters using a serial-enable association of parallel-enable 
blocks. 


14.3 Asynchronous Counters 


Asynchronous counters require less hardware space (and generally also less power) than their synchro- 
nous counterparts. However, due to their serial clock structure, they are also slower. The following cases 
will be described: 


Case 1: Asynchronous modulo-2’ counters 
Case 2: Asynchronous modulo-M counters 


Case 1 Asynchronous modulo-2” counters 


Figure 14.15 shows classical asynchronous full-scale counter implementations. In Figure 14.15(a), all flip- 
flops are TFFs, with no toggle-enable input (t). The actual clock signal is applied only to the first flip-flop 
and the output of each stage serves as input (clock) to the next stage. This circuit (as well as all the others 
in Figure 14.15) is a downward counter because it produces 4342q,q)={"1111" — "1110" > "1101"... 
— "0000" — "1111" > ...}, where qy is again the LSB. This sequence can be observed in the partial tim- 
ing diagram shown in Figure 14.15(d). Notice that gg is one TFF-delay behind clk, q, is two TFF-delays 
behind clk, and so on. In Figure 14.15(b), the most common choice for the TFF’s internal structure is 
depicted (DFF-based), which corresponds to that seen in Figure 13.24(a). Finally, in Figure 14.15(c), the 
same type of counter is shown but now with a counter-enable input (ena), which is connected to the t 
(toggle-enable) input of each TFF. When ena ='1', the counter operates as usual, but it stops counting and 
remains in the same state when ena='0'. These TFFs can be implemented with any of the structures seen 
in Figure 13.24(b). 

As mentioned above, the counters of Figure 14.15 count downward. To have them count upward, a 
few alternatives are depicted in Figure 14.16, with TFFs employed in the first two diagrams and DFFs in 
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(b) 


(c) 


(0) | (7) | (6) (5) (4) | (3) | (2) (1) | (0) 


wr LAELS LE Le LA Le lL} Le Le 
qo J 
qi 
(d) Q2 
FIGURE 14.15. Asynchronous modulo-2" downward counters: (a) Using TFFs without toggle-enable input; 


(b) Showing the most common choice for the TFF’s internal structure (DFF-based); (c) Same counter, now with 
a global counter-enable (ena) input; (d) Partial timing diagram. 


the last two. In Figure 14.16(a), q’ is used instead of g to feed the clock to the next stage. In Figure 14.16(b), 
negative-edge TFFs are utilized. Equivalent circuits are shown in Figures 14.16(c) and (d), but with DFFs 
instead of TFFs (notice that the portions within dark boxes implement TFFs). 


Case 2 Asynchronous modulo-M counters 


The simplest way of causing an asynchronous counter (Figure 14.16) to return to zero is by resetting 
all flip-flops when output=M occurs (where M is the number of states). This is the approach depicted 
earlier in Figure 14.7, which, as described in Case 3 of Section 14.2, is not adequate for synchronous circuits 
because it exhibits two major flaws. 

What might make it acceptable here is that one of the flaws does not occur in asynchronous counters 
due to the clearly determined temporal relationship among the output signals, which was already illus- 
trated in the timing diagram of Figure 14.15(d), showing that q, can only change after qo, qp after q,, q3 after 
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FIGURE 14.16. Asynchronous modulo-2" upward counters constructed with (a) and (b) TFFs or (c) and (d) DFFs. 


4x, and so on. Therefore, there is no risk of accidentally resetting the flip-flops during state transitions. 
Then all that needs to be done is to monitor the bits that must be '1' when the counter reaches the state 
immediately after the desired final state (output=M). For example, if it is a 0-to 5 counter, then q, and q, 
must be monitored (with an AND gate), because q,q,q)="110" (=6) is the next natural state. 

Note, however, that one problem still remains with this simplistic approach because the system will 
necessarily enter the (undesired) output=M state, where it remains for a very brief moment, thus caus- 
ing a glitch at the output before being reset to zero. In many applications, this kind of glitch is not a 
problem, but if it is not acceptable in a particular application, then one of the reset mechanisms described 
for the synchronous counters should be considered (at the expense of more hardware). Recall that such 
a glitch does not occur when it is a full-scale counter. 


MM EXAMPLE 14.7 ASYNCHRONOUS 0-TO-5 COUNTER 


Design an asynchronous 0-to-5 counter and draw its timing diagram. Assume that the propagation 
delays in the flip-flops are t,¢g=2ns (from clk to q—see Figure 13.12) and tpq=1ns (from rst to q), 
and that in any other gate it is t,=1ns. 


SOLUTION 


The circuit of Figure 14.16(b) was used with TFFs equipped with a reset input. The resulting circuit 
is depicted in Figure 14.17. An AND gate monitors q, and q, producing rst='1' when q2q,q)="110" 
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(=6) is reached, causing the counter to return to zero. The corresponding timing diagram is included 
in Figure 14.17. As can be observed, this approach, though simple, does exhibit a brief glitch when 
the output value is 6 (the glitch occurs in q,). The time delays are depicted in the inset, where the 
vertical lines are 1 ns apart (it takes 4ns for the glitch to occur and it lasts 2ns). 


FIGURE 14.17. Asynchronous 0-to-5 counter of Example 14.7. O 


14.4 Signal Generators 


This section describes the generation of irregular square waves, which constitutes a typical application 
for counters. By definition, a signal generator is a circuit that takes the clock as its main input and from 
it produces a predefined glitch-free signal at the output. 

The design technique described here is a simplified procedure. In Chapter 15, a formal design 
technique, based on finite state machines, will be studied, and in Chapter 23 such a technique will be 
combined with VHDL to allow the construction of more complex signals. As will be shown, the main 
drawback of a simplified procedure is that it is more difficult to minimize the number of flip-flops. The 
overhead in this design, however, is just one flip-flop, and the counter employed in the implementation 
is a regular binary counter (Sections 14.2 and 14.3). 

An example of a signal generator is depicted in Figure 14.18. The only input is clk, from which the 
signal called g must be derived. Because q must stay low during three clock periods and high during 
five periods (so T=8T>), an eight-state counter is needed. This means that there exists a circuit, which 
employs only three flip-flops (because 2°=8), whose MSB corresponds to the desired waveform. The 
problem is that such a counter is not a regular (sequential) binary counter because then the outputs 
would look like those in Figure 14.3(c), where q, is not equal to the desired signal, q. One of the simplest 
solutions to circumvent this problem is to still employ a regular binary counter, but then add an extra 
flip-flop to convert its output into the desired shape. 

This design technique can be summarized as follows: Suppose that q has two time windows (as in 
Figure 14.18), and that x and y are the counter values corresponding to the end of the first and second 
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FIGURE 14.18. Signal generator example. 


time windows, respectively (for example, x =2 and y=7). Take a regular (sequential) binary 0-to-y counter, 
then add an extra flip-flop to store q, whose input must be d='1' when counter = x, d='0' when counter =y, 
or d=q otherwise. This can be easily accomplished with OR+AND gates for x and with NAND+AND 
gates for y as illustrated in Example 14.8 below. Moreover, if the signal generator must produce a wave- 
form with more than two time windows, the extension of this procedure is straightforward, as shown in 
Example 14.9. 

Still referring to Figure 14.18, note that the actual values when the circuit turns q high or low are not 
important, as far as they are separated by five clock periods and the total number of clock cycles is eight. 
In other words, 1 and 6 would also do, as would 3 and 0, etc. Another option would be to construct a 
0-to-4 counter that counts alternately from 0 to 2 and then from 0 to 4, but this would save at most one 
flip-flop while requiring extra logic to operate the counter, being therefore generally not recommended. 

Finally, observe that in Figure 14.18 all transitions of q are at the same (rising) edge of the clock, so that 
the generator is said to be a single-edge circuit. Only single-edge generators will be studied in this chapter. 
For the design of complex, dual-edge circuits, a formal design procedure, utilizing finite state machines, 
will be seen in Chapter 15, which also allows the minimization of the number of flip-flops. 


MM EXAMPLE 14.8 TWO-WINDOW SIGNAL GENERATOR 


Figure 14.19(a) depicts a signal g to be generated from the clock, which has two time windows, the 
first with g='0' and duration 10T,, and the second with q='1' and duration 20T9, where Tp is the clock 
period. Design a circuit capable of producing this signal. 


[ counter | | da | 


x 1 
y 0 
others q 


FIGURE 14.19. Two-window signal generator of Example 14.8. 


SOLUTION 


A 0-to-29 counter can be employed here (because the system needs 30 states), whose output 
values of interest are listed in Figure 14.19(b), showing x ="01001" (=9) and y ="11101" (=29). Because 
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the counter is a regular binary counter, its design was already covered in Sections 14.2 and 14.3 of 
this chapter. When the counter is available, obtaining q is very simple, as shown in Figure 14.19(c). 
As mentioned before, an extra DFF is needed to store q plus two pairs of gates. The OR+ AND pair is 
used to process x, causing d='1' when the counter reaches x, while the NAND+AND pair processes 
y, producing d='0' when y is reached; in all other counter states, d=q (see the truth table in Figure 
14.19(c)). For x, all bits must be monitored, including the '0's; because x ="01001", 94'q3q>'q,'qo is the 
minterm to be processed (there are cases when not all bits need to be monitored, but then the whole 
truth table must be examined to verify that possibility). For y, only the 'l's are needed (because y is 
the counter’s last value, so no prior value will exhibit the same pattern of '1's); because y="11101", 
then 9493424) must be monitored. Finally, notice that in fact the gate processing y in Figure 14.19(c) is 
not needed because this information is already available in the counter (not shown). 

Note: It will be shown in Chapter 15 (Example 15.6) that an “irregular” counter (that is, one whose 
sequence of states is neither sequential nor Gray or any other predefined encoding style) suffices to 
solve this problem (in other words, the circuit can be implemented without the extra DFF shown in 
Figure 14.19(c)). However, we are establishing here a systematic solution for this kind of problem, so 
the extra flip-flop is indeed necessary. 


EXAMPLE 14.9 FOUR-WINDOW SIGNAL GENERATOR 


Figure 14.20(a) depicts a signal, q, to be generated from the clock, which has four time windows, the first 
with g='0' and duration 17 T), the second with q='1' and duration 87), the third with q='0' and duration 
7T,, and the last with q='1' and duration 28T,. Design a circuit capable of producing this signal. 


SOLUTION 


A sequential 0-to-59 binary counter can be employed in this case, whose main outputs are listed in 
Figure 14.20(b), showing x, ="010000" (=16), y, ="011000" (=24), x.="011111" (=31), and y,="111011" 
(=59). The solution is similar to that in the previous example, except for the fact that now two AND 
gates are needed for x (one for x, and the other for x), which guarantee d='1' when those values are 
reached by the counter, and two NAND gates are needed for y (for y, and yy), which guarantee d='0' 
when the respective values occur in the counter (see the circuit and truth table in Figure 14.20(c)). 
Again, for all control values (x1, X2, ¥1, Y2) all bits must be monitored, including the '0's, with the 
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FIGURE 14.20. Four-window signal generator of Example 14.9. 
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exception of y,, for which only the '1's are needed (recall that this information is already available 
in the counter anyway, so this gate is actually not needed). With respect to monitoring all bits in the 
other three gates, there are cases when not all are needed, but then the whole truth table would have 
to be examined to verify that possibility. Mi 


14.5 Frequency Dividers 


A clock (or frequency) divider is a particular type of two-window signal generator, which takes the 
clock as input and produces at the output a signal whose period (T) is a multiple of the clock period (T9). 
Depending on the application, the phase of the output signal might be required to be symmetric (duty 
cycle =50%). The design procedure described here is again a simplified procedure, so the same observa- 
tions made in the previous section are valid here (like the need for an extra flip-flop and the use of finite 
state machines to minimize their number). However, contrary to Section 14.4, dual-edge generators will 
be included in this section, where the following five cases are described: 


Case 1: Divide-by-2" 

Case 2: Divide-by-M with asymmetric phase 

Case 3: Divide-by-M with symmetric phase 

Case 4: Circuits with multiple dividers 

Case 5: High-speed frequency dividers (prescalers) 


Case 1 Divide-by-2” 


To divide the clock frequency by 2‘ (where N is a positive integer), the simplest solution is to use a 
regular (sequential) modulo-2' binary counter (Sections 14.2 and 14.3), whose MSB will automatically 
resemble the desired waveform. In this case, the number of flip-flops will be minimal, and the output 
signal will exhibit symmetric phase (duty cycle =50%) automatically (see Figures 14.3(c) and 14.15(d)). 


Case 2 Divide-by-M with asymmetric phase 


To divide the clock by M (where M is a nonpower-of-2 integer), any modulo-M counter can be employed, 
which can be one of those seen in Sections 14.2 and 14.3 or any other, with the only restriction that it must 
possess exactly M states. In most implementations an output with asymmetric phase will result, as is the 
case when the counter is a sequential counter. 

This fact is illustrated in Figure 14.21. In Figure 14.21(a), a binary 0-to-4 counter is used to divide 
the clock by 5, while in Figure 14.21(b) a binary 0-to-5 counter is employed to divide it by 6. Notice that 
the MSB (q2) in both cases exhibits asymmetric phase. This, however, is not a problem in many applica- 
tions, particularly when all circuits are activated at the same clock edge (that is, all at the rising edge or 
all at the falling edge). 


Case 3 Divide-by-M with symmetric phase 


When phase symmetry is required, a more elaborate solution is needed. We will consider the most 
general case, in which M is odd (for M=even, see Exercise 14.39). One way of designing this circuit is 
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FIGURE 14.21. Timing diagram for binary (a) 0-to-4 (W=5) and (b) 0-to-5 (W=6) counters. 
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FIGURE 14.22. Timing diagram for a divide-by-9 with symmetric phase. 


based on the timing diagram of Figure 14.22, where M=9. Note that the desired signal (q) has transitions 
at both clock edges, so this is a dual-edge signal generator. 

One way of obtaining q is by first generating the signal called q,, which stays low during (M-1)/2 clock 
cycles and high during (M+1)/2 clock periods. A copy of this signal, called q,, is then created, which is 
one-half of a clock cycle behind q,. By ANDing these two signals, the desired output (q=4q,°q,) results. Note 
that if q, is glitch-free, then g is automatically guaranteed to be glitch-free because q, and q, cannot change 
at the same time (they operate at different clock edges). 

This design approach can then be summarized as follows: Suppose that M is odd and that no dual- 
edge DFFs are available. Take a regular (sequential) positive-edge 0-to-(M-1) counter (Sections 14.2 
and 14.3) and create a two-window signal (Section 14.4) that stays low during (M-1)/2 clock cycles and 
high during (M+1)/2 cycles (q, in Figure 14.22). Make a copy of this signal into another DFF, operating 
at the falling edge of the clock (signal q, in Figure 14.22). Finally, AND these two signals to produce the 
desired output (q=q,°9,). This design technique is illustrated in the example below. 


MM EXAMPLE 14.10 DIVIDE-BY-9 WITH SYMMETRIC PHASE 
Design a circuit that divides f., by 9 and produces an output with 50% duty cycle. 


SOLUTION 


The timing diagram for this circuit is that of Figure 14.22. Because M=9, a regular (sequential) 
0-to-8 counter can be employed, from which a signal that stays low during four clock periods and 
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high during five (q,) can be easily obtained (as in Example 14.8). Such a signal generator is shown 
within the dark box in Figure 14.23 with the OR+AND pair processing x (=3="0011" in this case, 
SO 43'97'919) must be monitored) and the NAND+AND pair processing y (=8="1000", so q349'q4'qo’ 
must be monitored; recall, however, that only the '1's are needed in the counter’s last value, so the 
NAND can be replaced with an inverter). This circuit operates at the rising clock edge and produces 
q, A delayed (by one half of a clock period) copy of this signal is produced by the second DFF, which 
operates at the negative transition of the clock. By ANDing these two signals, the desired waveform 
(q) results, which is guaranteed to be glitch-free because q, and gq, are glitch-free (they come directly 
from flip-flops), and they can never change at the same time (we will discuss glitches in more detail 
in Chapter 15—see, for example, Section 15.3). 


FIGURE 14.23. Divide-by-9 circuit with symmetric phase (0-to-8 counter not shown; signal generator is within 
dark area). 


Case 4 Circuits with multiple dividers 


In certain applications a cascade of frequency dividers are needed. This is the case, for example, when 
we need to measure time. Suppose, for example, that we need to construct a timer that displays seconds. 
If fy, =1Hz, then a simple counter would do. This, of course, is never the case, because f,, is invariably 
in the multi-MHz or GHz range (for accuracy and practical purposes). The classical approach in this case 
is to use two (or more) counters with the first employed to reduce the frequency down to 1 Hz and the 
other(s) to provide the measurement of seconds. As will be shown in the example below, what the first 
counter in fact produces is a 1 Hz clock when it is asynchronous or a 1 Hz enable when it is synchronous. 


M™ EXAMPLE 14.11 TIMER 


Present a diagram for a circular timer that counts seconds from 00 to 59. Assume that the clock 
frequency is fy,=F. The output should be displayed using two SSDs (seven-segment displays—see 
Example 11.4). 


SOLUTION 


This is another application of frequency dividers (counters), for which two solutions are presented, 
one asynchronous and the other synchronous (mixed solutions are also possible). 


i. Asynchronous circuit: If the clock frequency is not too high (so maximum performance is not 
required), a completely asynchronous circuit can be used, as depicted in Figure 14.24(a), where 
the output (MSB) of one stage serves as clock to the next. The role of counter1 is to reduce the 
frequency to 1 Hz, while the other two counters comprise a BCD (binary-coded decimal, Section 
2.4) counter, with counter2 running from 0 to 9 (hence counting seconds) and counter3 from 0 
to 5 (counting tens of seconds). The output of counter2 has 4 bits, while that of counter3 has 
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3 bits. Both are converted to 7-bit signals by the SSD drivers (BCD-to-SSD converters) to feed the 
segments of the corresponding SSDs (see Example 11.4). 


ii. Synchronous circuit: This solution is depicted in Figure 14.24(b). Because now all counters are 
synchronous, the control is no longer done over the clock; instead, each counter must generate 
an enable signal to control the next circuit. Counter1 must produce enal ='1' when its state is F—1 
(recall that it is a 0-to-F—1 counter), thus causing the next circuit (counter2) to be enabled during 
one out of every F clock periods. Likewise, counter2 must produce ena2='1' when its state is 9 (it 
is a 0-to-9 counter). These signals (ena1 and ena2) are then ANDed to produce the actual enable 
for counter3, causing it to be enabled during one out of every 10F clock periods. 


dig2 dig 


BCD counter _t it 


@ mt | 


dig2 dig1 
BCD counter = _I i 


FIGURE 14.24. 00-to-59-second timer: (a) Asynchronous; (b) Synchronous. ia 


Note that the timer above is a circular timer (when 59 is reached it restarts automatically from zero). 
From a practical point of view, additional features are generally needed, like the inclusion of a stop/ 
reset button, an alarm when 59s (or another value) is reached, programmability for the final value, etc. 
Although these features are not always simple to add when designing the circuit “by-hand,” it will be 
shown that with VHDL it is straightforward (see Section 22.3). 


Case 5 High-speed frequency dividers (prescalers) 


When very high speed is needed, special design techniques must be employed, which are discussed 
separately in the next section. 


14.6 PLL and Prescalers 


PLL (phase locked loop) circuits are employed for clock multiplication and clock filtration, among 
other applications. Even though it is not a completely digital circuit, its increasing presence in high- 
performance digital systems makes its inclusion in digital courses indispensable. For instance, modern 
FPGAs (Section 18.4) are fabricated with several PLLs. 
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14.6.1 Basic PLL 


A basic PLL is depicted in Figure 14.25, which generates a clock whose frequency (f,,,) is higher than 
that of the input (reference) clock (f,,). The circuit operates as follows. The VCO (voltage-controlled 
oscillator) is an oscillator whose frequency is controlled by an external DC voltage. When operating 
alone, it generates a clock whose frequency is near the desired value, f,,,. This frequency is divided by 
M in the prescaler, resulting figop =four/ M, which might be near fi, but is neither precise nor stable. These 
two signals (foop and fi,) are compared by the PFD (phase-frequency detector). If flop <fin then the PFD 
commands the charge pump to increase the voltage applied to the VCO (this voltage is first filtered by a 
low-pass loop filter to attain a stable operation), thus causing f,,, to increase and, consequently, increas- 
ing floop aS well. On the other hand, if fi.o,>fin then the opposite happens, that is, the PFD commands 
the charge pump to reduce the voltage sent to the VCO, hence reducing f,,, and, consequently, figop- In 
summary, the process stabilizes when the output frequency is “locked” at f,,,.=Mfi,. A PLL can be used 
as a simple x2 multiplier or with much larger multiplication factors. For example, in Bluetooth radios, 
fin= 1 MHz and f,,.= 2.4GHz, so the multiplication factor is M=2400. 

The internal construction of the PFD and charge pump is shown in Figure 14.26(a). fi, and figop are con- 
nected to two positive-edge DFFs in the PFD. At the rising edge of f,, up='1' occurs. Likewise, at the rising 
edge of fi,op, down ='1' happens. After both signals (up, down) are high, the AND gate produces a '1' that 
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FIGURE 14.25. Basic PLL. 
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FIGURE 14.26. (a) Internal PFD and charge pump details; (b) Illustration of the frequency (and phase) locking 
procedure (initially, froop <Fin)- 
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resets both flip-flops but only after a certain time delay, D. This procedure is illustrated in Figure 14.26(b). 
Note that initially fio, <fin $0 up ='1' eventually occurs earlier than down ='1' (that is, up stays active longer 
than down). The up and down signals control two switches in the charge pump. When up='1' and down ='0' 
the upper current source charges the capacitor, C, increasing the voltage sent to the VCO. On the other 
hand, when up='0' and down ='1', the lower current source discharges the capacitor, hence decreasing the 
control voltage. An additional RC branch is shown as part of the low-pass loop filter. 


14.6.2 Prescaler 


The +M block in Figure 14.25 normally operates at a high frequency, hence requiring a special design 
technique. Such a high-speed, specially designed frequency divider is called prescaler. (Note: Other defi- 
nitions also exist, like the use of the words prescaler and postscaler to designate dividers placed outside 
the PLL circuit, that is, before and after it, respectively.) Observe, however, that only the DFFs in the ini- 
tial stages of the prescaler must be specially designed because when the frequency is reduced down to a 
few hundred MHz, conventional flip-flops can be employed in the remaining stages (which are normally 
asynchronous). 

DFFs were studied in Sections 13.4-13.9. As mentioned there, prescalers operating with input fre- 
quencies in the 5GHz range [Shu02, Ali05, Yu05] have been successfully constructed using TSPC circuits 
(Figures 13.17(a)-(c)) and state of the art CMOS technology. For higher frequencies, SCL and ECL flip- 
flops are still the natural choices, in which case the transistors are fabricated using advanced techniques, 
like those described in Sections 8.7 and 9.8 (GaAs, SiGe, SOI, strained silicon, etc.). For example, pre- 
scalers operating over 15 GHz have been obtained with S-SCL (Figure 13.17(h)) [Ding05, Sanduleanu05], 
over 30GHz with SCL (Figure 13.17(h)) [Kromer06, Heydari06], and over 50GHz with ECL flip-flops 
(Figure 13.17(g)) [Griffith06, Wang06]. 

Besides the DFFs, another concern is with the inter-DFF propagation delays, which are minimized by 
using as few gates as possible. Moreover, whenever possible, such gates are inserted into the flip-flops 
(that is, are combined with the DFF’s circuit at the transistor level). 

The inter-DFF connections are illustrated in Figure 14.27(a), which shows a divide-by-8 circuit with 
only one gate, with fan-in=1, in the feedback path, which is as simple as it can get (the corresponding 
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FIGURE 14.27. (a) M=8 prescaler and (b) its timing diagram; (c) M=7 prescaler; (d) Dual-modulus prescaler 
(M=7 when MC='1', M=8 when MC='0’). 
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timing diagram is shown in Figure 14.27(b), where, for simplicity, the delays between the clock edges 
and the transitions that they provoke were neglected). With a slight modification, this circuit is turned 
into a divide-by-7 (Figure 14.27(c)), where again only one gate is observed in the feedback path (though 
now with fan-in=2). These two circuits are combined in Figure 14.27(d) to produce a programmable 
(through MC (mode control)) divide-by-7/8 circuit (the widely used notation 7/8 can be misleading; it 
means “7 or 8,” not “7 over 8”). This last kind of circuit is referred to as dual-modulus prescaler. 

Note also that the number of states in this kind of counter is much smaller than 2‘ (where N is the 
number of flip-flops). Indeed, it is just 2N when M is even, or 2N-1 when M is odd. Therefore, |M/2 
flip-flops are needed to implement a divide-by-M circuit. This can be confirmed in the timing diagram of 
Figure 14.27(b), relative to the circuit for M=8 (Figure 14.27(a)), where each signal starts repeating itself 
after eight clock pulses. As can be also seen, all bits, not only the last, are f.,./8, and all exhibit symmetric 
phase. However, for M=odd, the phase is always asymmetric, with all bits staying low during (M-1)/2 
clock cycles and high during (M+1)/2 clock periods (Exercise 14.48). 

Because only the initial stages of the prescaler must operate at very-high frequency, the overall circuit 
is normally broken into two parts, one containing the high speed section (always synchronous) and 
the other containing the remaining stages (normally asynchronous). This is depicted in Figure 14.28(a), 
where the first block is a synchronous dual-modulus divide-by-M,/(M,+1) counter and the second 
is an asynchronous divide-by-M, counter (where M, is a power of 2, so each stage is simply a divide- 
by-2, that is, a DFF operating as a TFF). The resulting circuit is a divide-by-M/(M +1) prescaler, where 
M=M,Ms). 

A divide-by-32/33 prescaler, implemented using the technique described above, is shown in Figure 14.28(b). 
The first block was constructed in a way similar to that in Figure 14.27, while the second is simply an asyn- 
chronous 0-to-7 counter (see Figure 14.16(d)). In this case, M, =4 and M,=8,so M=M,M,=32. The circuit oper- 
ates as follows. Suppose that MC='0'; then every time the second counter’s output is "000" it produces X='1', 
which “inserts” the third DFF of the first counter into the circuit, causing that circuit to divide the clock by 5. 
During the other seven states of the second counter (that is, "001", "010", ...,"111"), node X remains low, which 
“removes” the third DFF of the first counter from the circuit (because then Y='1'), causing that circuit to divide 
the clock by 4. In summary, the synchronous counter divides the clock by 5 once and by 4 seven times, resulting 
in a divide-by-33 operation. When MC='1' is employed, X stays permanently low, in which case the synchro- 
nous circuit always divides the clock by 4. Therefore, because the second counter divides the resulting signal 
always by 8, a divide-by-32 operation results. 


clk + M,/(My+1) + Mz clk 
(a) mn (synchronous) (asynchronous) out 


FIGURE 14.28. (a) Typical approach to the construction of dual-modulus prescalers (the circuit is broken 
into two sections, the first synchronous, high speed, the second asynchronous); (b) Divide-by-32/33 prescaler 
(M=32 if MC='1', M=33 if MC='0'). 
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FIGURE 14.29. Programmable PLL. 


14.6.3 Programmable PLL 


In many applications, the PLL needs to generate a programmable frequency instead of a fixed one. This is 
the case, for example, in multichannel radio systems for wireless data communication. The separation 
between the channels (which is also the separation between the programmable frequencies) defines 
the PLL resolution. Suppose, for example, that the output frequency is in the 2.4GHz range and that the 
channels must be 1 MHz apart. In this case, the PLL resolution has to be 1 MHz. This parameter defines 
the value of f,, (because f,,,.=M *fin, Where M is an integer, f,, has to be either 1 MHz or an integer divisor 
of 1MHz). 

A programmable PLL is depicted in Figure 14.29, which has two major differences compared to the 
basic PLL of Figure 14.25. The first difference is at the input. Because the frequency of the system clock 
(fsys) is normally higher than that of the needed reference (f,,= 1 MHz in the present example), an extra 
(low frequency) divider (+R) is necessary to produce f,,. The second difference is the presence of an addi- 
tional counter (called counter AB) after the prescaler. This counter divides the frequency by B, producing 
a signal (MC) that stays low during A cycles and high during B—A cycles, where A and B are program- 
mable parameters (note that this is simply a two-window signal generator, studied in Section 14.4). The 
waveform at the bottom of Figure 14.29 illustrates the type of signal expected at the output of counter 
AB, and it indicates ranges of programmability for its parameters (A, B). Because the prescaler divides 
its input by M+1 while MC='0', and by M while M='1', the total division ratio, Mr=fout/fioop, is NOW 
M,;=A(M +1)+(B-A)(M)=B-M+A. Recall also that f,,= ie /R, and when locked fi.o, =fin: In summary: 


four= (B ‘M+A)fin=(B *M+A)(foys/R) 


For example, with M=32 (that is, a divide-by-32/33 prescaler) and 7 bits for counter AB, with A pro- 
grammable from 0 to 31 and B programmable from 72 to 80, the whole range 2.3 GHz <f,,.<2.6 GHz can 
be covered in steps of 1 MHz (assuming that f,, = 1MHz). 


14.7 Pseudo-Random Sequence Generators 


The generation of pseudo-random bit sequences is particularly useful in communication and computing 
systems. An example of application is in the construction of data scramblers (the use of scramblers was 
seen in Chapter 6, with detailed circuits shown in the next section) for either spectrum whitening or as 
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part of an encryption system. In this type of application, the sequence must be pseudo-random, otherwise 
the original data would not be recoverable. 

Pseudo-random sequences are normally generated using a circuit called linear-feedback shift register 
(LFSR). As illustrated in Figure 14.30(a), it consists simply of a tapped circular shift register with the taps 
feeding a modulo-2 adder (XOR gate) whose output is fed back to the first flip-flop. The shift register 
must start from a nonzero state so the initialization can be done, for example, by presetting all flip-flops 
to '1' (note in Figure 14.30(a) that the reset signal is connected to the preset input of all DFFs), in which 
case the sequence produced by the circuit is that shown in Figure 14.30(b) (d=...0001001101011110...). 
Because the list in Figure 14.30(b) contains all N-bit vectors (except for "0000"), the circuit is said to be a 
maximal-length generator. Figure 14.30 also shows, in (c), a simplified representation for the circuit in (a); 
this type of representation was introduced in Sections 4.11 and 4.13 and will again be employed in the 
next section. 

Any pseudo-random sequence generator of this type is identified by means of a characteristic polyno- 
mial. For the case in Figure 14.30, the polynomial is 1+.x°+.x* because the taps are derived after the third 
and fourth registers. Examples of other characteristic polynomials are given in Figure 14.31. 
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FIGURE 14.30. (a) Four-stage LFSR-based pseudo-random sequence generator with polynomial 1+x?+x*; 
(b) Corresponding truth table; (c) Simplified representation depicting the flip-flops as simple blocks and the 
XOR gate as a modulo-2 adder. 
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FIGURE 14.31. Examples of characteristic polynomials for LFSR-based pseudo-random sequence generators. 
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MM EXAMPLE 14.12 PSEUDO-RANDOM SEQUENCE GENERATOR 


Consider the 4-bit pseudo-random sequence generator of Figure 14.30. The data contained in the 
accompanying truth table was obtained using "1111" as the LFSR’s initial state. Suppose that now a 
different initial state (="1000") is chosen. Show that the overall sequence produced by the circuit is 
still the same. 


SOLUTION 


Starting with "1000", the new sequence is {8, 4, 2,9, 12, 6, 11, 5, 10, 13, 14, 15, 7,3, 1,8, ...}. Since the previous 
sequence was {15, 7,3, 1, 8, 4, 2, 9, 12, 6, 11,5, 10, 13, 14, 15, ...}, they are indeed circularly equal. | 


14.8  Scramblers and Descramblers 


As mentioned in Sections 6.1 and 6.9, scramblers are circuits that pseudo-randomly change the values 
of some bits in a data block or stream with the purpose of “whitening” its spectrum (that is, spread it so 
that no strong spectral component will exist, thus reducing electromagnetic interference) or to introduce 
security (as part of an encryption procedure). The pseudo-randomness is normally accomplished using 
an LFSR circuit (described in the previous section). In this case, a scrambler is just an LFSR plus an addi- 
tional modulo-2 adder (XOR gate), and it is specified using the LFSR’s characteristic polynomial. 

There are two types of LFSR-based scramblers, called additive and multiplicative (recursive) scram- 
blers. Both are described below, along with their corresponding descramblers. 


14.8.1 Additive Scrambler-Descrambler 


Additive scramblers are also called synchronous (because they require the initial state of the scrambler 
and descrambler to be the same) or nonrecursive (because they do not have feedback loops). A circuit of 
this type is shown in Figure 14.32(a), where a simplified representation similar to that in Figure 14.30(c) 
was employed. Its characteristic polynomial (which is the LFSR’s polynomial) is 1+ x? +x"! because the 
taps are connected at the output of registers 9 and 11. This is the scrambler used in the 100Base-TX inter- 
face described in Section 6.1, which repeats its sequence after 2N_] =2047 bits. 

Note that the LFSR is connected to the data stream by means of just an additional modulo-2 adder 
(XOR gate), where x(1) represents the data to be scrambled (at time 1), k(n) represents the “key” pro- 
duced by the LFSR, and c(n) represents the scrambled codeword. The corresponding descrambler is 
shown in Figure 14.32(b), whose circuit is exactly the same as that of the scrambler. 

Physically speaking, the circuit of Figure 14.32(a) simply causes the value of a bit in the main data stream 
(x) to be flipped when the LFSR produces a '1'. Therefore, if the circuit of Figure 14.32(b) is synchronized with 
that of Figure 14.32(a), it will cause the same bits to flip again, hence returning them to their original values. 

Formally, the recovery of x(m) can be shown as follows. At time n, c(n)=x(n)@®k(n). Because 
k(n)=k(n-9) ®k(n-11), c(n)=x(n) @k(n-9) O®k(n-11) results. At the descrambler, y(n)=c(n) ®k(n) is 
computed. Assuming that the two circuits are synchronized, y(n) =c(n) ®k(n—9) @k(n-11) results, hence 
y(n) =x(n) O@k(n-9) Ok(n-11) Ok(n-9) Ok(n-11) = x(n). 

The disadvantage of this approach over that below is that synchronism is required (that is, both ends 
must start from the same initial state), which in practice is achieved by sending a sequence of known 
symbols. For example, in the Ethernet 100Base-TX interface (introduced in Section 6.1), where this par- 
ticular circuit is employed, synchronism occurs after a sequence of ~20 idle symbols are sent. 
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FIGURE 14.32. (a) Additive scrambler with polynomial 1+x°+x'' used in the Ethernet 100Base-TX interface 
and (b) corresponding descrambler. 


FIGURE 14.33. (a) Multiplicative scrambler with polynomial 1+x?+x"' and (b) corresponding descrambler. 


14.8.2 Multiplicative Scrambler-Descrambler 


Multiplicative scramblers are also called asynchronous (because they do not require LFSR synchroniza- 
tion) or recursive (because they have a feedback loops). A scrambler-descrambler pair of this type is 
shown in Figure 14.33, again employing the LFSR with characteristic polynomial 1+.x? +x"). 

The proof that y(m)=x(n) is as follows. At time n, c(n)=x(n)@®k(n). Because k(n)=c(n-9)® 
c(n-11), c(n)=x(n)@®c(n-9)@®c(n—-11) results at the scrambler’s output. At the descrambler, y(n)= 
c(n)®k(n) is computed. Because k(n)=c(n-9) @c(n-11), y(n) =c(n) Ok(n-9) O®k(n-11) results, hence 
y(n) =x(n) @c(n-9) @ c(n-11) Oc(n-9) @c(n-11) = x(n). 

This pair is self-synchronizing, meaning that they do not need to start from the same initial state. 
However, this process (self-synchronization) might take up to N bits (clock cycles), so the first N values 
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of y(n) should be discarded. The disadvantage of this approach is that errors are multiplied by T+1, 
where T is the number of taps (T =2 in Figure 14.33). Therefore, if one bit is flipped by noise in the chan- 
nel during transmission, 3 bits will be wrong after descrambling with the circuit of Figure 14.33(b). It is 
important to mention also that when the errors are less than N bits apart, less than T+ 1 errors per incor- 
rect bit might result because the superposition of errors might cause some of the bits to be (unintention- 
ally) corrected. 


MM EXAMPLE 14.13 MULTIPLICATIVE SCRAMBLER-DESCRAMBLER 


a. Sketch a circuit for a multiplicative scrambler-descrambler pair with polynomial 1+x°+.x*. 


b. With the scrambler starting from "0000" and the descrambler from state "1111" (hence different 
initial states), process the data stream "101100010110" (starting from the left) and check whether 
y=x occurs (recall that it might take N clock cycles for the circuits to self-synchronize). 


c. Suppose now that an error occurs during transmission in the sixth bit of the scrambled code- 
word. Descramble it and show that now T+1=3 errors result. 


SOLUTION 


Part (a): 
The multiplicative scrambler-descrambler pair with polynomial 1+.°+x* is shown in Figure 14.34(a). 


channel 


| registers | k_c | y | 
1111 ee 0 1 


o|-|0 


FIGURE 14.34. Multiplicative scrambler-descrambler pair with polynomial 1+x?+x* of Example 14.13; (b) The 
codeword c="101011100100" is produced by the scrambler when the data sequence is x="101100010110", 
with "0000" as the initial state; (c) Descrambling with initial state "1111" produces y=x after the first 4 bits; 
(d) An error was introduced in the 6th bit of the codeword, which became T+ 1=3 errors after descrambling 
(errors are indicated with circles). 
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Part (b): 

The scrambler operation is summarized in the table of Figure 14.34(b), where the codeword 
c="101011100100" is produced when the data sequence is x="101100010110", using "0000" as the 
initial state. The descrambler operation, with initial state "1111", is summarized in the table of Figure 
14.34(c), where y= x occurs after the first N=4 bits. 


Part (c): 

The descrambling in this case is depicted in Figure 14.34(d), again with "1111" as the initial state. An 
error was introduced in the sixth bit of the received codeword, which became T+1=3 errors after 
descrambling (the errors are indicated with circles). Ml 


14.9 Exercises 


1. Circular shift register 
Draw a diagram for a circular SR whose rotating sequence is "00110" (see Figure 14.2(c)). 
2. SR timing analysis 


Suppose that the propagation delay from c/k to q in the DFFs employed to construct the SR of 
Figure 14.2(a) is t,-g=5ns. Assuming that the circuit is submitted to the signals depicted in 
Figure E14.2, where the clock period is 30ns, draw the resulting output waveforms (adopt the sim- 
plified timing diagram style of Figure 4.8(b)). 


rst 
Rok | el a re ee Loe 


FIGURE E14.2. 
3. Event counter 


Consider the waveform x depicted in Figure E14.3. How can we design a circuit that counts all events 
that occur on x (that is, rising plus falling edges)? (Hint: Think about who could be the LSB). 


Se Se aR, Si a a 


FIGURE E14.3. 
4. Synchronous 0-to-31 counter with TFFs 
a. Draw a circuit for a synchronous 0-to-31 counter with parallel enable using regular TFFs. 


b. Repeat the design above, this time with serial enable. 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 


. Synchronous 0-to-31 counter with DFFs 


a. Draw a circuit for a synchronous 0-to-31 counter with parallel enable using regular DFFs. 


b. Repeat the design above, this time with serial enable. 


. Synchronous 0-to-255 counter with TFFs 


Draw a circuit for a synchronous 0-to-255 counter with serial enable using regular TFFs. 


. Synchronous 0-to-255 counter with DFFs 


Draw a circuit for a synchronous 0-to-255 counter with serial enable using regular DFFs. 


. Synchronous 0-to-4 counter with TFFs 


a. Design asynchronous 0-to-4 binary counter using regular TFFs (see Example 14.1). 

b. Draw a timing diagram for your circuit (consider that the propagation delays are negligible). 
Synchronous 0-to-4 counter with DFFs 

a. Design asynchronous 0-to-4 binary counter using regular DFFs (see Example 14.3). 

b. Draw a timing diagram for your circuit (consider that the propagation delays are negligible). 
Synchronous 0-to-4 counter using DFFs with clear 

a. Design a synchronous 0-to-4 binary counter using DFFs with a flip-flop clear port (see Example 14.4). 
b. Draw a timing diagram for your circuit (consider that the propagation delays are negligible). 
Synchronous 2-to-6 counter with DFFs 

a. Design asynchronous 2-to-6 binary counter using regular DFFs (see Example 14.5). 

b. Draw a timing diagram for your circuit (consider that the propagation delays are negligible). 
Synchronous 1-to-255 counter with DFFs 

Design a synchronous 1-to-255 sequential counter with serial enable using regular DFFs. 
Synchronous 8-to-15 counter with four DFFs 

Design a synchronous 8-to-15 sequential counter using four regular DFFs (see Example 14.5). 
Synchronous 8-to-15 counter with three DFFs 

Repeat the design above using the minimum number of DFFs (see Example 14.6). 

Synchronous 20-to-25 counter with five DFFs 

a. Design asynchronous 20-to-25 counter using five regular DFFs (see Example 14.5). 

b. Draw a timing diagram for your circuit (consider that the propagation delays are negligible). 
Synchronous 20-to-25 counter with three DFFs 

a. Design asynchronous 20-to-25 counter using the minimum number of DFFs. 

b. Is this circuit faster or slower than that in the previous exercise? 


c. How would you design it if it were a 2000-to-2005 counter? 
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17. 


18. 


19. 


Synchronous 0-to-1023 counter with serial enable 
Draw a circuit for a synchronous 0-to-1023 binary counter with serial enable using regular DFFs. 
Synchronous 0-to-1023 counter with parallel enable 


Repeat the exercise above, this time with parallel enable. However, limit the number of DFFs per 
block to four (so more than one block will be needed—see Figure 14.14). 


Synchronous counter with enable #1 


Figure E14.19 shows the same synchronous counter of Figure 14.3(a), whose timing diagram is in 
Figure 14.3(c). The only difference is that now the toggle-enable port (t) of the first TFF has an enable 
(ena) signal connected to it. 


a. Is this little modification enough to control the counter, causing it to behave as a regular counter 
when ena='1' or remain stopped when ena='0'? If so, is this simplification valid also for the cir- 
cuit in Figure 14.14(a), that is, can we replace the wire from T)y to ¢ of all registers with just one 
connection from Tj, to ¢ of the first register? (Hint: Consider that ena='0' occurs while qg='1'.) 


b. Suppose that two counters of this type must be connected together to attain an 8-bit synchro- 
nous counter. Sketch the circuit, showing how your enable (ena) signal would be connected to 
the counters in this case. 


FIGURE E14.19. 


20. 


21. 


Synchronous counter with enable #2 


Consider the 0-to-9 counters designed with DFFs in Examples 14.3 and 14.4. Make the modifications 
needed to include in the circuits an “enable” port (the counter should operate as usual when ena ='1' 
or remain in the same state if ena='0'). This should be done for the following two circuits: 


a. Counter of Figure 14.11(b). 
b. Counter of Figure 14.10(d). 
Programmable 4-bit counter #1 


Design a programmable 0-to-M counter, where 0=M<=15. The value of M should be set by a 
programmable 4-bit input, illustrated with "1100" (=12) in Figure E14.21. The counter must be 
synchronous and with serial enable. 
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FIGURE E14.21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


Programmable 4-bit counter #2 
Repeat the exercise above for a synchronous counter with parallel enable. 
Programmable 8-bit counter #1 


Design a programmable 0-to-M counter, where 0 = M =255. The value of M should be set by a pro- 
grammable 8-bit input, similar to that in Figure E14.21 (but with 8 bits, of course). The counter must 
be synchronous and with serial enable. 


Programmable 8-bit counter #2 


Repeat the exercise above for a synchronous counter with parallel enable. The circuit should be 
composed of two 4-bit blocks (see Figure 14.14). 


Asynchronous 0-to-63 counter with DFFs 

Draw a circuit for an asynchronous 0-to-63 sequential counter (see Figure 14.16). 
Asynchronous 63-to-0 counter with DFFs 

Draw a circuit for an asynchronous downward 63-to-0 sequential counter (see Figure 14.15). 
Asynchronous 0-to-62 counter with DFFs 

Design an asynchronous 0-to-62 sequential counter (see Example 14.7). 
Asynchronous 0-to-255 counter with DFFs 

Design an asynchronous 0-to-255 sequential counter (see Figure 14.16). 
Asynchronous 0-to-254 counter with DFFs 

Design an asynchronous 0-to-254 sequential counter (see Example 14.7). 
Synchronized counter outputs 


Consider the diagram of Figure E14.30(a), which is relative to a sequential binary counter that must 
produce two outputs, x and y, with y always one unit behind x (that is, x=y+1). One (not good) 
solution is depicted in Figure E14.30(b), which shows two counters, the first reset to "00...01" and 
the second to "00...00" upon initialization. This solution uses too much (unnecessary) hardware. 
Devise a better circuit that solves this problem. 
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counter1 


x (“000...001") 
clk—> counter 
counter2 
y y y 


(a) (b) > 


rst ("000...000") 


FIGURE E14.30. 


31. 


32. 


33. 


34. 


35. 


36. 


Synchronized shift-register outputs 


Consider again the diagram of Figure E14.30(a), but suppose that now the output x comes from 
a circular shift register, whose rotating sequence is initialized with "1000". Suppose also that the 
sequence y must be delayed with respect to x by one clock period (so y should be initialized with 
"0001"). A (bad) solution analogous to that in Figure E14.30(b) could be employed, requiring two 
circular SRs. As in the exercise above, devise a better circuit that solves this problem. 


Two-window signal generator #1 


In Figure 14.19 the counter was omitted. Choose a counter and then redraw Figure 14.19 with it 
included. Eliminate any unnecessary gate. 


Two-window signal generator #2 


Design a circuit whose input is a clock signal and output is a signal similar to that of Figure 14.19(a), 
with two time windows, the first with width 20T, and the second 10T,, where T) is the clock 
period. 


Programmable two-window signal generator 


In the signal generator of Example 14.8, the time windows were 10T, and 207). In the exercise 
above, they were 20T, and 107. Draw a circuit for a programmable signal generator with an extra 
input, called sel (select), such that when sel='0' the signal generator of Example 14.8 results and 
when sel='1' that of Exercise 14.33 is implemented. 


Four-window signal generator 


Design a circuit whose input is a clock signal and output is a signal similar to that of Figure 14.20(a), 
with four time windows, with the following widths (from left to right): 5T, 10T9, 15To, and 20T). 


PWM circuit 


A digital PWM (pulse width modulator) takes a clock waveform as input and delivers a pulse 
train with variable (programmable) duty cycle at the output. In the illustration of Figure E14.36 
the duty cycle is 2/7 (or 28.6%) because the output stays high during two out of every seven clock 
periods. Sketch a circuit for this PWM (the inputs are clk and duty, while the output is y; duty is 
a 3-bit signal that determines the duty cycle—see table in Figure E14.36). (Suggestion: Think of 
it as a two-window signal generator (as in Example 14.8) with the value of y fixed but that of x 
programmable.) 
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clk —4> PWM y 
1/7 (14.3%) 
2/7 (28.6% 
duty 
4/7 (57.1%) 
clk 5/7 (71.4% 
| 


FIGURE E14.36. 


37. Divide-by-8 circuit 

Design a circuit that divides the clock frequency (f.,) by M=8. Discuss the phase symmetry in this case. 
38. Divide-by-5 with symmetric phase 

a. Design a circuit that divides f., by M=5 and produces an output with 50% duty cycle. 

b. Can you suggest an approach different from that in Section 14.5 (Example 14.10)? 
39. Divide-by-14 with symmetric phase 


Design a circuit that divides the clock frequency by M = 14 and exhibits symmetric phase. Because 
M is even here, which simplifications can be made with respect to the approach described in 
Section 14.5 (Example 14.10)? 


40. Two-digit BCD counter #1 


Figure E14.40 contains a partial sketch for a 2-digit BCD (binary-coded decimal, Section 2.4) coun- 
ter. Each stage has a 4-bit output that produces a decimal value between 0 and 9. Therefore, the 
circuit can count from 00 to 99, as described in the accompanying truth table. 


a. Make a sketch for this circuit using synchronous counters. What type of interaction must 
exist between the two counters? (Suggestion: See the timer in Example 14.11, with counter1 
removed.) 


b. Add two SSDs (seven-segment displays) to the circuit in order to numerically display the result 
(see Example 11.4). Include blocks for the BCD-to-SSD converters. 


counter2 counter1 


counter1 counter2 
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0-to-9 
counter 


0-to-9 
counter 


clk 


FIGURE E14.40. 
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41. 


42. 


Two-digit BCD counter #2 
Repeat the exercise above using asynchronous counters. 
1HZ signal 


A partial sketch for a system that derives from the clock, with frequency f4,.=F,a1Hz waveform is 
depicted in Figure E14.42 (divide-by-F frequency divider). As can be observed, it is a 16-bit system 
that is composed of four blocks, each containing a synchronous 4-bit counter with parallel enable. 


a. Study this problem and then make a complete circuit sketch, given that fy,=50kHz. Add (or 
suppress) any units that you find (un)necessary. Provide as many circuit details as possible. 
Where does the output signal come from? Will its phase be automatically symmetric? 


b. What is the highest f., that this system can accept while still producing a 1 Hz output? Will the 
phase be automatically symmetric in this case? 


a a 
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Synchronous Synchronous Synchronous Synchronous 
4-bit counter 4-bit counter 4-bit counter 4-bit counter 


FIGURE E14.42. 


43. 


44, 


45. 


Timer #1 


In Example 14.11 the construction of a circular 00-to-59 seconds timer was discussed. Assuming 
that f,.=5Hz, show a detailed circuit for each one of the three counters. Consider that they are all 
synchronous with serial enable. 


Timer #2 
Repeat the exercise above with asynchronous counters. 
Frequency meter #1 


This exercise deals with the design of a frequency meter. One alternative is depicted in Figure E14.45, 
where x is the signal whose frequency we want to measure, f, is the measurement, and clk is the system 
clock (whose frequency is f4,=F). The circuit contains two counters and a register. Counter] creates, 
from clk, a signal called write that stays low during 1 second (that is, during F clock periods), and high 
during one clock period (T—see the accompanying timing diagram). While write ='0', counter2 counts 
the events occurring on x, which are stored by the register when write changes from '0' to '1', at the 
same time that counter? is reset. Note that counter! is a two-window signal generator (Section 14.4). 


Is this approach adequate for the measurement of low or high (or both) frequencies? 


b. What is the inactivity factor of this approach (that is, the time fraction during which the circuit 
does not measure x) as a function of F? 


c. Present a detailed diagram (with internal block details) for this circuit. 
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FIGURE E14.45. 


46. 


47. 


48. 


49. 


Frequency meter #2 


Another approach to the construction of frequency meters is exactly the opposite of that described 
above, that is, to count the number of clock cycles that occur between two (positive) edges of x 
instead of counting the number of events that occur on x during two edges of a signal generated 
from the clock. 


a. Draw a block diagram for this circuit (note that a divider is needed). 


b. What are the advantages and disadvantages of this approach compared to the previous one? Is 
it adequate for low or high (or both) frequencies? Does it require more or less hardware than the 
other? What is its inactivity factor? 


c. Show a detailed diagram for each block presented in part (a). 


d. Consider that the clock frequency is accurate enough such that the main error is due to the par- 
tial clock period that goes unaccounted for at the beginning and/or at the end of each cycle of x. 
Prove that, if the maximum frequency to be measured is f.,, and the maximum error accepted 
in the measurement is e, then the clock frequency must obey fo, = fumax/: 


PLL operation 


Describe the operation of a PLL (see Figure 14.25), considering that initially f,,.,,> fin: Draw a timing 
diagram analogous to that depicted in Figure 14.26(b) and show /explain how the system is eventu- 
ally locked to the correct frequency. 


Divide-by-7 prescaler 


Draw the timing diagram for the prescaler of Figure 14.27(c). Confirm that it divides the clock fre- 
quency by M=7. During how many clock periods does the output stay low and during how many 
does it stay high? 


Dual-modulus prescaler #1 

Figure E14.49 shows a basic divide-by-M/(M +1) prescaler. 

a. Determine the value of M for MC='0' and for MC='1'. 

b. Draw a timing diagram (similar to that in Figure 14.27(b)) for MC='0'. 
c. Repeat part (b) for MC='1'. 
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clk/M 
For MC='0' For MC="1' 
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FIGURE E14.49. 


50. Dual-modulus prescaler #2 
Figure E14.50 shows a basic divide-by-M/(M +1) prescaler. 
a. Determine the value of M for MC='0' and for MC='1'. 
b. Draw a timing diagram (similar to that in Figure 14.27(b)) for MC='0'. 
c. Repeat part (b) for MC='1'. 


For MC='0' For MC='1' 
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FIGURE E14.50. 


51. Dual-modulus prescaler #3 
Figure E14.51 shows a basic divide-by-M/(M + 1) prescaler. 
a. Determine the value of M for MC='0' and for MC='1'. 


b. Draw a timing diagram (similar to that in Figure 14.27(b)) for MC='0'. 
c. Repeat part (b) for MC='1'. 


For MC='0' For MC='1' 
4 ip Us a el py tS py UR Fn Ld iy LE SG Dp IL a 
A renee (eaeciert ener Meret Antreeria Reverie (eee 5:8 Reeueripes Suepen UOGIOECR EPlereetts GRU | Warne (Nea! 
Ie eee econ Fees eee See See ee Ip ESSER eee! Morr. Seeee Cenene Lense Cees 
2 RR COERRE RITE SARS CREDY SOE tir SERIE st ee NOE, eS ENE: OED SOS ere 
do do 


FIGURE E14.51. 


14.11 


Exercises with SPICE 395 


52. Fifth-order pseudo-random sequence generator 


a. 


b. 


Draw an LFSR-based pseudo-random sequence generator defined by the polynomial of degree 
N=5 in Figure 14.31. 


Starting with all flip-flops preset to '1', write a table with the contents of all flip-flops until the 
sequence starts repeating itself. Add an extra column to the table and fill it with the decimal values 
corresponding to the flip-flops’ contents. Check whether this is a maximal-length generator. 


If the initialization had been done differently (for example, with 9991929344='""10000"), would the 
overall sequence produced by the circuit be any different? 


53. Additive data scrambler 


a. 


Cc 


Make a sketch for an additive scrambler-descrambler pair (similar to that in Figure 14.32) whose 
LFSR is that in Figure 14.30. 


Suppose that the data sequence x="101010101010 ..." is processed by this scrambler. What is the 
resulting bit sequence (c) at its output? 


Pass this sequence (c) through the corresponding descrambler and confirm that x is recovered. 


54. Multiplicative data scrambler 


a. 


Make a sketch for an additive scrambler-descrambler pair (similar to that in Figure 14.33) using 
an LFSR whose polynomial is 1+ .x°+x’. 


Suppose that the data sequence x = "101010101010 ..." is processed by this scrambler. What is the 
resulting bit sequence (c) at its output? 


Pass this sequence (c) through the corresponding descrambler and confirm that x is recovered. 


14.10 Exercises with VHDL 


See Chapter 22, Section 22.7. 


14.11 Exercises with SPICE 


See Chapter 25, Section 25.16. 
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Finite State Machines 


Objective: This chapter concludes the study of sequential circuits (initiated in Chapter 13). A formal 
design procedure, called finite state machine (FSM), is here introduced and extensively used. The FSM 
approach is very helpful in the design of sequential systems whose operation can be described by means 
of a well-defined (and preferably not too long) list containing all possible system states, along with the 
necessary conditions for the system to progress from one state to another, and also the output values 
that the system must produce in each state. This type of design will be further illustrated using VHDL 
in Chapter 23. 
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15.1 Finite State Machine Model 


The specifications of a sequential system can be summarized by means of a state transition diagram, like 
that depicted in Figure 15.1. What it says is that the machine has four states, called stateA, stateB, stateC, 
and stateD; it has one output, called y, that must be '0' when in stateA, stateB, or stateC, or '1' when in 
stateD; and it has one input (besides clock, of course, and possibly reset), called x, which controls the state 
transitions (the machine should move from stateA to stateB at the next clock edge if x='1' or stay in stateA 
otherwise; similar transition information is provided also for the other states). 

In terms of hardware, a simplified model for an FSM is that shown in Figure 15.2(a). The lower section 
(sequential) contains all the sequential logic (that is, all flip-flops), while the upper section (combinational) 
contains only combinational circuitry. Because all flip-flops are in the lower section, only clock and reset are 
applied to it, as shown in the figure. The data currently stored in the flip-flops (called pr_state) represent 
the system’s present state, while the data to be stored by them in a future clock transition (called nx_state) 
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x='0' 


FIGURE 15.1. Example of state transition diagram. 


input 
* ® 
input Combinational output ro 
logic sa 
a 
(b) 
pr_state nx_state 
output 
Sequential 
logic clock _ 
reset 8 \ 
“Tl 
i clock 
(a) (c) reset 


FIGURE 15.2. (a) Simplified FSM model (for the hardware); (b) Mealy machine; (c) Moore machine. 


represent the system’s next state. The upper section is responsible for processing pr_state, along with the 
circuit’s actual input, to produce nx_state as well as the system’s actual output. 

To encode the states of an FSM, regular (sequential) binary code (Section 2.1) is often used, with the 
encoding done in the same order in which the states are enumerated (declared). For example, if with 
VHDL the following data type had been used in the example of Figure 15.1: 


TYPE machine_state IS (stateA, stateB, stateC, stateD) 


then the following binary words would be assigned to the states: stateA ="00", stateB="01", stateC="10", 
and stateD ="11". Other encoding styles also exist and will be described in Section 15.9. In the particular 
case of circuits synthesized onto CPLD/FPGA chips (Chapter 18), the one-hot style is often used because 
flip-flops are abundant in those devices. 
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One important aspect related to the FSM model is that, even though any sequential circuit can in 
principle be modeled as such, this is not always advantageous. A counter is a good example because it 
can have a huge number of states, making their enumeration impractical. As a simple rule of thumb, 
the FSM approach is advisable in systems whose tasks constitute a well-structured and preferably not 
too long list such that all states can be easily enumerated and specified. 


Mealy versus Moore machines 


A last comment about the FSM model regards its division into two categories, called Mealy and Moore 
machines. If the machine’s output depends not only on the stored (present) state but also on the current 
inputs, then it is a Mealy machine (Figure 15.2(b)). Otherwise, if it depends only on the stored state, it is a 
Moore machine (Figure 15.2(c)). A conventional counter is an example of the latter because its next output 
depends only on its current state (the circuit has no inputs—except for clock, of course). Note, however, 
that the former is more general, so most designs fall in that category. 


15.2 Design of Finite State Machines 


A design technique for FSMs, which consists of five steps, is described in this section. Several application 
examples follow. 


Step 1: Draw (or describe) the state transition diagram (as in Figure 15.1). 


Step 2: Based on the diagram above, write the truth tables for nx_state and for the output. Then 
rearrange the truth tables, replacing the states’ names with the corresponding binary values 
(recall that the minimum number of bits—thus the number of flip-flops—needed to imple- 
ment an FSM is log, rounded up, where n is the number of states). 


Step 3: Extract, from the rearranged truth tables, the Boolean expressions for nx_state and for the output. 
Make sure that the expressions (either in SOP or POS form) are irreducible (Sections 5.3 and 5.4). 


Step 4: Draw the corresponding circuit, placing all flip-flops (D-type only) in the lower section 
and the combinational logic for the expressions derived above in the upper section (as in 
Figure 15.2). 


Step 5 (optional): When the circuit is subject to glitches at the output, but glitches are not 
acceptable, add an extra DFF for each output bit that must be freed from glitches. The extra 
DFF can operate either at the rising or falling edge of the clock. It must be observed that, 
due to the extra flip-flop, the new output will then be either one clock cycle or one-half of 
a clock cycle delayed with respect to the original output depending on whether a rising- or 
falling-edge DFF is employed, respectively (assuming that the original machine operates 
at the positive clock edge). 


It is important to mention that in this design procedure only D-type flip-flops are employed. If a TFF 
is needed, for example, then the combinational logic (from the upper section) that will be associated to 
the DFF will automatically resemble a TFF (this fact will be illustrated in Example 15.3). 


MM EXAMPLE 15.1 A BASIC FSM 


The state transition diagram of an FSM is depicted in Figure 15.3. It contains three states (A, B, 
and C), one output (y), and one input (x). When in state A, it must produce y='0' at the output, 
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x='0' OR x="1' 


FIGURE 15.3. FSM of Example 15.1. 


proceeding to state B at the next positive clock transition (assuming that is a positive-edge machine) 
if the input is x='0' at that moment, or staying in state A otherwise. When in B, again y='0' must 
be produced, proceeding to C only if x='1' when the positive edge of clk occurs. Finally, if in C, it 
must cause y= x’ and return to A at the next rising edge of clk regardless of x. Design a circuit that 


implements this FSM. 


SOLUTION 


The solution, using the formal FSM procedure, is presented below. 


Step 1: This step consists of constructing the state transition diagram, which was already provided 


in Figure 15.3. 


Step 2: From the specifications contained in the state transition diagram, the truth table for 
nx_state and y can be obtained and is shown in Figure 15.4(a). This truth table was then 
rearranged in Figure 15.4(b), where all bits are shown explicitly. Because there are three 
states, at least 2 bits ([log,3]=2 flip-flops) are needed to represent them. Also, because 
nx_state is always connected to the inputs of the flip-flops (all of type D) and pr_state to 
their outputs, the pairs of bits d;dy and q,gy) were used to represent nx_state and pr_state, 


respectively. 


Truth table for y and nx_state 


Inputs Outputs 
pr_state x y nx_state 

A 0 0 B 

1 0 A 

B 0 i) B 

1 0 (¢} 

Cc 0 1 A 

(a) 1 ft) A 


Karnaugh map for y 


| x | oo | o1 | 11 | 10 | 


ee} o | o xt 1) 
() Lay ofol[ xo | 


Inputs Outputs 


pr_state % nx_state 
q1 Go d; do 


FIGURE 15.4. (a) and (b) Truth tables and (c) Karnaugh maps for the FSM of Figure 15.3. 
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Step 3: We must derive, from the truth table, the Boolean expressions for y and nx_state. Applying 
the trio q,4)x of Figure 15.4(b) into three Karnaugh maps (Section 5.6), one for y, one for d, 
and another for dy (Figure 15.4(c)), the following irreducible SOP expressions (Section 5.3) 
result: 


Y=q x 
dy =qg'Xx 
dy=qy' *x' 

Step 4: Finally, we can draw the circuit corresponding to the conclusions obtained above. The 
result is shown in Figure 15.5(a), with the DFFs in the lower part and the combinational 
logic implementing the expressions just derived in the upper part. To clarify the rela- 
tionship with the general FSM model of Figure 15.2 even further, the circuit was slightly 
rearranged in Figure 15.5(b), where all signals (x, y, pr_state, and nx_state) can be more 
clearly observed. 


Step 5: There is no specification concerning glitches. Moreover, this system is not subject to glitches 
anyway, and the output in state C must follow x (y=x’), so Step 5 should not be employed. 


pr_state 
(41 Go) 


>) nx_state 
(1 do) 


(b) 


FIGURE 15.5. (a) Circuit that implements the FSM of Figure 15.3; (b) Rearranged version. 


EXAMPLE 15.2 THE SMALLEST AND SIMPLEST FSM 
It is well known that a DFF is a two-state FSM. Verify this observation using the FSM model. 


SOLUTION 


Step 1: The state transition diagram of such a machine is shown in Figure 15.6. It contains only 
two states, A and B, and the values that must be produced at the output are y='0' when in 
state A, or y='l' when in state B. The transition from one state to the other is controlled by 
x (so x is an input). Because x='1' causes y='l', and x='0' causes y='0' (at the next rising 
edge of clk, assuming that it is a positive-edge circuit), then y is simply a synchronous copy 
of x (hence a DFF). 
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x='0' x=" 


FIGURE 15.6. The smallest and simplest FSM (a DFF). 


Step 2: From the diagram of Figure 15.6, the truth tables of Figure 15.7(a) were extracted. Because 
there are only two states, only one bit (one flip-flop) is needed to represent them. Assuming 
that the states A and B were declared in that order, the corresponding encoding then is A='0', 
B='1', shown in the rearranged truth tables of Figure 15.7(b). 


Step 3: From the first truth table, we conclude that y=pr_state, and in the second we verify that 
nx_state=x. 


Step 4: Finally, applying these pieces of information into the general FSM model of Figure 15.2, 
results in the circuit of Figure 15.7(c), which, as expected, is simply a DFF. Step 5 is not 
needed because the output already comes directly from a flip-flop. 


Truth table for y Truth table for y 
pr_state y pr_state 
qo 
(A) 0 


Upper section 


pr_state 
go 


(a) 


FIGURE 15.7. Implementation for the FSM of Figure 15.6: (a) Original and (b) rearranged truth tables; 
(c) Corresponding circuit, disposed according to the general model of Figure 15.2. 


EXAMPLE 15.3 COUNTER—A CLASSICAL FSM 


As mentioned in Section 14.2, counters are among the most frequently used sequential circuits. For 
that reason, even though their design was already studied in Sections 14.2 and 14.3, some of it is 
repeated here, now using the FSM approach. One important aspect to be observed regards the flip- 
flop type. Because the present design procedure employs only D-type flip-flops (DFFs), and counters 
often need T-type flip-flops (TFFs), it is then expected that the combinational logic generated dur- 
ing the design (upper section), when combined with the DFFs (lower section), will resemble TFFs. 
Design a modulo-8 counter and analyze the resulting circuit to confirm (or not) this fact. 


SOLUTION 


Step 1: The state transition diagram for a 0-to-7 counter is shown in Figure 15.8. Its states are named 
zero, one, ..., seven, each name corresponding to the decimal value produced by the counter 
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FIGURE 15.8. State transition diagram for a full-scale 3-bit counter. 


when in that state. This is a Moore machine because there are no control inputs (only clock 
and, possibly, reset). 


Step 2: From the diagram of Figure 15.8, the truth table for nx_state was obtained (Figure 15.9(a)), 
which was then modified in Figure 15.9(b) to show all bits explicitly. 


Truth table for nx_state Tie Sete eT Se 


pr_state nx_state 
five 
six 


pr_state nx_state 
291 Go d2 di do 
| ooo | oo1 | 
| oo1 |] o10 | 


Pat] 000 | 


ne 
0 

Six seven 
(a) |_seven zero 


FIGURE 15.9. Truth table for the FSM of Figure 15.8. 


Step 3: From the truth table, using Karnaugh maps, the following irreducible SOP expressions are 
obtained: 


d=4r® (41°40) 
d\=4, 04 
dy=4o' 

Step 4: Using the information above, the circuit of Figure 15.10(a) was constructed. Again, only 
D-type flip-flops were employed, all located in the lower section, while the combinational 
logic for the expressions derived in Step 3 was placed in the upper section. Step 5 is not 


needed because the outputs already come straight from flip-flops. Notice that the circuit 
perfectly resembles the general model of Figure 15.2. 
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pr_state nx_state 


(a) 


(c) 


FIGURE 15.10. Circuit that implements the FSM of Figure 15.8: (a) Designed circuit; (b) and (c) Rearranged 
versions for comparison with synchronous counters seen in Section 14.2 (Figures 14.5(a) and 14.3(a), 
respectively). 


To conclude, we want to compare this counter with those in Section 14.2. Rearranging the 
circuit of Figure 15.10(a), that of Figure 15.10(b) results. The portions inside dark boxes are TFFs 
(see Figure 13.24(b)), so an equivalent representation is shown in Figure 15.10(c). Comparing 
Figure 15.10(b) with Figure 14.5(a), or Figure 15.10(c) with Figure 14.3(a), we observe that, as 
expected, the circuits are alike. 


EXAMPLE 15.4 SYNCHRONOUS 3-TO-9 COUNTER 


This example deals with the design of a synchronous modulo-M counter (where M<2) with a 
nonzero initial value. Apply the formal FSM design procedure to design a synchronous 3-to-9 
counter. Afterward, compare the results (equations) with those obtained in Example 14.5. 


SOLUTION 


Step 1: The state transition diagram is shown in Figure 15.11. The states are called three, four, 
five,...,nine. The values that must be produced at the output in each state are shown 
between parentheses. 


Step 2: From the state transition diagram, the truth table for nx_state is obtained (included in 
Figure 15.11). 
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Step 3: From the truth table, the Karnaugh maps of Figure 15.12 are obtained, from which the follow- 
ing irreducible SOP expressions result for nx_state: 


d3=43°4o +92°41 Ao 
dy=4o* 91" + 92°40 +942! ° 
dy =4y' “4o+ 41°40 
dy=43+4o' 


Step 4: The circuit implementing the expressions above is depicted in Figure 15.12. Again, Step 5 is 
not needed because the outputs come directly from flip-flops. And, once again, the circuit 
resembles the general FSM model of Figure 15.2. 


Truth table for nx_state 
pr_state nx_state 


3 4241 qo ds do di} do 
| 0044 | 0100 | 


(001°) 


| 1000 | 1001 | 
L 4004 | o04at | 


FIGURE 15.11. State transition diagram for a synchronous 3-to-9 counter (Example 15.4) and corresponding 
truth table. 


Karnaugh map for d, Karnaugh map tor 4 
Be 
[aa [oo | or [10 
J] | 00 | x | Tx} 
or x 1 x, 
HH] ti) 0 | x 
— ttt 
| xji/s | x 
| i 
Karnaugh map for d, Karnaugh map tor d 
2 
De B® | 3 
‘ 
ae [OO] OF pnp tOn| [wa 1) [oa 1 x 
O}xl}ol} x] olf) o ix] + | 
or x 1 x ' oO 0 x 
" 0 ° x x n ° i) x 
| EEE ror 
wiix]s [x {x woi/x | + fix 
Sols Se | a4 


FIGURE 15.12. Karnaugh maps and circuit that implements the FSM of Figure 15.11 (a synchronous 3-to-9 
counter). 
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Finally, we want to compare it to a similar counter designed in Section 14.2 (Example 14.5, 
Figure 14.12). The equations obtained in that example were the following: 
d3=43"4o +93'* 42°41" o 
dy=4o® (41°40) = 92°91 + 92° Fo! +92" 91°40 
dy =4, 94 =91' 90+ 40" 
do=93+4o' 

Comparing these equations to those obtained using the FSM approach, we observe that d, and dy 
are equal, but d3 and d, are simpler in the latter. The reason is very simple: In the FSM approach, we 
took advantage of the states that no longer occur in the counter, that is, because it counts from 3 to 
9 and employs four flip-flops, there are nine nonoccurring states (0, 1, 2, 10, 11, 12, 13, 14, and 15), 
which were not taken into consideration in the simplified design of Example 14.5. 

Note: Indeed, a small part of the “don’t care” states was taken advantage of in Example 14.5 when 
adopting the simplification that “only the ‘1's need to be monitored in the final value because no other previous 
state exhibits the same pattern of '1’s” (that is, monitoring only the '1's leads potentially to smaller expres- 
sions than monitoring the '1's and the '0's). To confirm this fact, the reader in invited to write the values 
obtained with the expressions of Example 14.5 in the Karnaugh maps of Figure 15.12. 


EXAMPLE 15.5 STRING DETECTOR 


Design a circuit that takes as input a serial bit stream and outputs a 'l' whenever the sequence 
"111" occurs. Overlaps must also be considered, that is, if the input sequence is "...0111110...", for 
example, then the output sequence should be "...0001110...". Analyze whether the proposed solu- 
tion is subject to glitches or not. (See also Exercises 15.5-15.7.) 


SOLUTION 


Step 1: The corresponding state transition diagram is depicted in Figure 15.13. A top-level block 
diagram is also shown, having x, clk, and rst as inputs and y as output. 

Step 2: From the state transition diagram, the truth tables of Figure 15.14(a) are extracted, which 
were modified in Figure 15.14(b) to show all bits explicitly. Because the machine has four 
states, at least two DFFs are needed, whose inputs are dp and d, (nx_state) and outputs are q 
and q, (pr_state). 


ma Ls hi 


p detector 


FIGURE 15.13. State transition diagram for the FSM of Example 15.5. 
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Step 3: With the help of Karnaugh maps (Figure 15.14(c)), the following irreducible SOP expressions 


for y and nx_state are obtained: 
vege a 
dy =qy"X+qq°x 


dy=qyX+qo' x 


Step 4: The circuit for this FSM is shown in Figure 15.15(a). The flip-flops are located in the lower 
(sequential) section, while the combinational logic needed to implement the equations above 


is in the upper (combinational) section. 


Truth table for 


pr_state y 
a1 de 
|__zero_ | 0 | Eimrea 
|_one | 0 | <p 
|__two_ | 0 | 
Truth table for nx_state pr_state ¥ nx_state 
pr_state x nx_state 4140 dy do 


three 0 zero 
1 


(a) 


FIGURE 15.14. (a) and (b) Truth tables and (c) Karnaugh maps for the FSM of Figure 15.13 (Example 15.5). 


FIGURE 15.15. FSM of Example 15.5 (Figure 15.13): (a) Circuit; (b) Timing diagram. 
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Step 5: To illustrate the occurrence of glitches, a timing diagram was drawn in Figure 15.15(b), 
showing x="00110111100" as the input sequence, from which y="0000000110" is expected 
(notice that the values of x are updated in the falling edge of the clock, while the FSM oper- 
ates at the rising edge). If we assume that the high-to-low transition of the DFFs is slower 
than the low-to-high transition, then both flip-flop outputs will be momentarily high at the 
same time when the machine moves from state one to state two, hence producing a brief spike 
(y='1') at the output (see the timing diagram). 


In conclusion, this circuit’s output is subject to glitches. If they are not acceptable, then Step 5 
must be included. There is, however, another solution for this particular example, which consists of 
employing a Gray code to encode the states of the FSM because then when the machine moves from 
one state to another only one bit changes (except when it returns from state two to state zero, but this 
is not a problem here). In other words, by encoding the states as zero="00", one="01", two="11", and 
three="10", glitches will be automatically prevented (it is left to the reader to confirm this fact). This, 
however, is normally feasible only for very small machines, as in the present example; in general, 
Step 5 is required when glitches must be prevented. (For more on this, see Exercises 15.5-15.7.) 


EXAMPLE 15.6 A BASIC SIGNAL GENERATOR 


A signal generator is depicted in Figure 15.16, which must derive, from clk, the signal y. As indi- 
cated in the figure, y must stay low during three clock periods (3Ty), and high during five clock 
cycles (57). Design an FSM that implements this signal generator. Recall that in any signal generator 
glitches are definitely not acceptable. 


oe UU UU 


clk—> — generator y y 3To 


Signal 


Lh 5To : 


FIGURE 15.16. Signal generator of Example 15.6. 


SOLUTION 


Step 1: The corresponding state transition diagram is depicted in Figure 15.17. The FSM has 
eight states (called zero, one, ..., seven), and must produce y='0' in the first three states 
and y='1' in the other five (notice that this approach would be inappropriate if the num- 
ber of clock periods were too large; FSMs with a large number of states are treated in 
Section 15.4). 


Step 2: From Figure 15.17, the truth table for y and nx_state can be easily obtained and is presented in 
Figure 15.18(a). The table was then rearranged with all bits shown explicitly in Figure 15.18(b). 
Notice that to encode eight states, three bits (so at least three flip-flops) are required, with their 
outputs (929,99) representing the present state (pr_state) and their inputs (d,d,d,) representing 
the next state (nx_state) of the machine. 
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FIGURE 15.17. State transition diagram for the signal generator of Figure 15.16 (Example 15.6). 


Truth table for y and nx_state 


nx_state 
|_zero_ | 0 | 
|_one_| 0 | 


|_two | 0 | 


| three | 1 | 
| four | 1 | 
|__six | 4 | 


* 
(a) 


FIGURE 15.18. Truth table for the FSM of Figure 15.17. 


Step 3: From the truth table, with the help of Karnaugh maps, the following irreducible SOP 
expressions are obtained: 


97=42t 41° Yo 
dy =4o © (41 *40) 
d)=4, 04% 
dy=4o' 
Step 4: The circuit implementing the expressions above can now be drawn and is shown in 
Figure 15.19(a). Again, the circuit directly resembles the general FSM model of Figure 15.2. 
Step 5: Logic gates with several inputs compute y, which might all change at the same time. Such 
changes, however, are never perfectly simultaneous. Suppose, for example, that due to slightly 
different internal propagation delays the outputs of the three DFFs change in the following order: 
Jo, then q,, then q,. If so, when the system moves from state three to state four, for example, it 
goes momentarily through states two and zero, during which y='0' is produced (see Figure 15.20). 


Because y was supposed to stay high during that transition, such a '0' constitutes a glitch. Then 
Step 5 is needed to complete the design, which adds the extra flip-flop seen in Figure 15.19(b). 
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q q 
clk—> 
pr_state — nx_state Circuit of 
figure 
15.19(a) 
( a) clk ( b) — clk 


FIGURE 15.19. (a) Solution of Example 15.6 using only steps 1-4; (b) Step 5 included to eliminate glitches at 
the output. 


| | Paes 


(3) EZ)! (4) 


y Len. glitch 


FIGURE 15.20. Glitch formation when the system of Figure 15.19(a) moves from state three to state four 
(states two and zero occur during that transition, causing a brief y='0' pulse). Oo 


15.3 System Resolution and Glitches 


Resolution of sequential systems: Given a synchronous system whose registers are all controlled by the same 
clock signal, it is clear that its resolution can not be better than one half of a clock period. This would be 
the case for units operating at both clock edges. For units employing only single-edge flip-flops (which 
is generally the case), all operating at the same clock transition (that is, all operating at the positive clock 
edge or all at the negative edge), the resolution is worse, that is, it is one whole clock period, meaning 
that any event lasting less than one clock cycle might simply go unnoticed. 

Resolution of combinational systems: Combinational logic circuits, being unregistered (hence unclocked), 
are not subject to the limitation described above. Their outputs are allowed to change freely in response 
to changes at the inputs with the resolution limited only by the circuit’s internal propagation delays. 
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However, if the system is mixed (that is, with sequential and combinational circuits), then the resolution 
is generally determined by the clocked units. 

Glitches: Glitches are undesired spikes (either upward or downward) that might corrupt a signal dur- 
ing state transitions, causing it to be momentarily incorrect (as in Examples 15.5 and 15.6). 

Although simple, these concepts are very important. For example, we know that one technique to 
remove glitches is to add extra registers (Step 5 of the FSM design procedure). However, if in the system 
specifications an output is combinationally tied to an input, then that output should not be stored for the 
obvious reason that a change at the input lasting less than one (or one half of a) clock period might not 
be perceived at the output. As an example, let us observe Example 15.1 once again. Its resolution is one 
clock period (because all flip-flops are single-edge registers). However, in the specification of state C 
(Figure 15.3), it is said that while in that state the output should be y= x’. Because there is no constraint 
preventing x from lasting less than one clock period, y should not be stored if maximum resolution is 
desired. In other words, Step 5 should not be used. Glitches do not happen in that particular example, 
but if they did, and if glitches were unacceptable, then they would have to be eliminated by redesigning 
the combinational (upper) section. If still no solution could be found, only then Step 5 should be consid- 
ered (at the obvious cost of system resolution). 


15.4 Design of Large Finite State Machines 


In all designs illustrated so far, the number of states was small, so the formal 5-step technique described 
earlier could be easily employed, leading to a circuit similar to that illustrated in Figure 15.21(a), 
with all flip-flops (D-type only) located in the lower section and the combinational logic in the upper 
section. There is, however, a practical limitation regarding that approach because it requires that 
all states be explicitly enumerated (declared); for example, if the signal generated in Example 15.6 
spanned 800 clock cycles instead of 8, then a straight application of that design technique would be 
completely impractical. 

The large number of states that might occur in a state machine is normally due to a “built-in” counter. 
To solve it, because we know how to design a counter of any type and any size (Chapter 14), we can 
separate the counter from the main design. In other words, the problem can be split into two parts, one 
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FIGURE 15.21. (a) Basic FSM model; (b) and (c) Model for large FSMs (the counter is separated from the main 
machine). 


412 CHAPTER 15 Finite State Machines 


related to the main FSM (for which the 5-step formal FSM design technique is still employed) and the 
other related to the counter, for which we can adopt a direct design, as we did in Sections 14.2 and 14.3. 
This, of course, might cause the Boolean expressions related to the counter to be nonminimal, as already 
illustrated in Example 15.4, but the results are still correct and the difference in terms of hardware is very 
small. The overall architecture can then be viewed as that in Figure 15.21(b), where the counter appears 
as an inner FSM, while the main circuit occupies the outer FSM. A more objective illustration appears in 
Figure 15.21(c), where the counter, in spite of being also an FSM, has been completely separated from the 
main circuit, with its output playing simply the role of an extra input to the main machine (along with x 
and pr_state). The use of this design technique is illustrated in the example below. 


MM EXAMPLE 15.7 LARGE-WINDOW SIGNAL GENERATOR 


Redesign the signal generator of Example 15.6, but this time with the time windows lasting 30 and 
50 clock cycles instead of 3 and 5, as illustrated in Figure 15.22(a). 


counter<29 counter<79 


counter=29 


Signal 
clk—> generator 


counter=79 


(b) 


FIGURE 15.22. (a) Signal generator of Example 15.7 and (b) its state transition diagram. 


SOLUTION 


In this case, separating the counter from the main machine is advisable, which allows the system to 
be modeled as in Figure 15.21(c), where the counter’s output acts as an input to the main machine. 
The state transition diagram is then depicted in Figure 15.22(b). 

Counter design: A 0-to-79 counter is needed, which can be designed following any one of the 
several techniques described in Sections 14.2 and 14.3. A synchronous 4-bit counter, with parallel 
enable and flip-flop clear input, was chosen (see Example 14.2, Figure 14.9). Because a total of 7 bits 
are needed here, two such counters were utilized. The complete circuit is shown in Figure 15.23. 
Recall that clear, contrary to reset, is a synchronous input. In this example, clear ='0' will occur when 
the counter reaches "1001111" (=79), which forces the circuit to return to zero at the next positive 
clock edge. 

FSM design: We need to design now the main machine (Figure 15.22(b)). To ease the deri- 
vation of the Boolean functions (without Karnaugh maps or other simplification techniques), 
we will make a slight modification in the state transition diagram shown in Figure 15.24(a) 
(compare it to Figure 15.22(b)). This will cause the expression for nx_state to be nonminimal, 
but again the result is correct and only slightly bigger (see comparison and comments at the 
end of Example 15.4). 

From the truth tables in Figure 15.24(b), the following expressions are obtained: 


Y=4q 
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Tout 


FIGURE 15.23. 0-to-79 counter of Example 15.7 (two 4-bit synchronous counters were utilized to construct a 
7-bit circuit). 
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FIGURE 15.24. (a) Simplified state transition diagram for the FSM of Figure 15.22; (b) Corresponding truth tables. 


where 19 and 117. are the minterms corresponding to the decimals 29 and 79, respectively, that is: 
M9 = 4o\ 45° * 44" 93°92°91' “Fo 
My9= 46°45 *94'* 93°92" 91° 40-46" 43°92" I" Fo 

Note that, as indicated above, for m7, only the '1's need to be monitored because 79 is the counter’s 
largest value, so no other prior value exhibits the same pattern of '1's. This information, nevertheless, 
is already available in the counter (see gate that computes clear in Figure 15.23). It is also important 
to remark that any two minterms with a separation of 50 would do, that is, we could have used mp, 
with 1175, M39 With gq, etc. 

The circuit that computes the expressions above is shown in the upper section of the FSM depicted 
in Figure 15.25, while the DFF that produces q (pr_state) appears in the lower section. As described 
in the model of Figure 15.21(c), the counter is separated from the main machine, to which it serves 
simply as an extra input. Though flip-flops might provide complemented outputs, in Figure 15.25 
only the regular outputs were shown to simplify the drawing. Moreover, only NAND gates were 
used (smaller transistor count—Section 4.3) in the upper section, though other choices there also 
exist. Notice also that the fan-in of the gate computing 11, is 8, which might be unacceptable for 
certain types of implementations (recall that the delay grows with the fan-in), so the reader is invited 
to reduce the fan-in to 5 or less, and also explore other gates for the upper section. 
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FIGURE 15.25. Final circuit for the FSM of Example 15.7. The only external signals are c/k (in) and y (out). & 


15.5 Design of Finite State Machines with 
Complex Combinational Logic 


In Section 15.4 we described a technique that simplifies the design of FSMs whose number of states (due 
to a built-in counter) is too large. In this section we describe another technique that is appropriate for 
cases when the combinational section is too complex, that is, for machines whose outputs are expressed 
by means of functions that might change from one state to another. 

This model is illustrated in Figure 15.26(a), where a three-state FSM is exhibited, with y (the output) 
expressed by means of different functions of x (the input) in each state, that is, y=f,(x) in state A, y=f,(x) 
in state B, and y=f,(x) in state C. This, of course, can be modeled exactly as before; however, if the 
expressions for y are already available or are simple to derive, then the model of Figure 15.26(b) can be 


sel 


(a) (b) 


FIGURE 15.26. Alternative design technique for FSMs whose output expression varies from one state to 
another (multiplexer controlled by FSM). 
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employed, which consists of a multiplexer (Section 11.6) whose input-select port (sel) is controlled by an 
FSM. This might substantially reduce the design effort because now only a much simpler FSM needs to 
be designed (a counter). This technique is illustrated in the example that follows. 


WM EXAMPLE 15.8 PWM 


The circuit employed in this example was already seen in Chapter 14 (Exercise 14.36). A digital PWM 
(pulse width modulator) is a circuit that takes a clock waveform as input and delivers a pulse train 
with variable (programmable) duty cycle at the output. In the illustration of Figure 15.27 the duty 
cycle is 2/7 (or 28.6%) because the output stays high during two out of every seven clock periods. 
The 3-bit signal called x determines the duty cycle of y, according to the table shown on the right of 
Figure 15.27. Design a circuit for this PWM. 


17 (14.3%) 
2/7 (28.6% 
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5/7 (71.4%) 
6/7 (85.7% 


clk 
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FIGURE 15.27. PWM of Example 15.8. 


SOLUTION 


This circuit can be understood as a two-window signal generator (like that in Example 14.8), where the 
size of the first time window is programmable (from 0 to 7T,, with Tp representing the clock period), 
while the total size is fixed (=7Ty). Due to the programmable input (x), y will inevitably be expressed 
as a function of x. The first part of the solution is depicted in Figure 15.28. The state transition diagram, 


Truth table for nx_state and y 


pr_state nx_state 


FIGURE 15.28. (a) State transition diagram for the PWM of Figure 15.27; (b) Truth table describing the system 
operation; (c) Truth table for the FSM. 
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containing seven states (A, B, ..., G), is shown in Figure 15.28(a). The value of y in each state can be 
determined with the help of the table in Figure 15.28(b), where, for simplicity, the bits of x were repre- 
sented by a, b, and c; for example, if x="000", then y='0' in all seven states; if x="001", then y='0' in all 
states but the last, and so on. From that table we obtain: 


In A: y=f,(x) =a-b-c 

In B: y=f,(x) =(previous) +a: b=a-b-c+a-b=a-b 
In C: y=f,(x) =(previous)+a-c=a-b+a-c 

In D: y=f,(x) = (previous) +a=a-b+a-c+a=a 

In E: y=f,(x) =(previous) +b-c=at+b-c 

In F: y=f¢(x) =(previous)+b=a+b-c+b=a+b 

In G: y=f,(x) =(previous) +c=a+b+c 


These are the expressions that are included in the state transition diagram of Figure 15.28(a). 

As can be seen, y is computed by a different function of x in each state, so the mixed model (FSM + 
multiplexer) described above can be helpful (though the general model seen earlier can obviously 
still be used). The resulting circuit is presented in Figure 15.29, which shows, on the left, the circuit 
needed to compute the seven mux inputs, and, on the right, the overall system. The FSM is now a 
trivial 0-to-6 counter, and the multiplexer can be any of those studied in Section 11.6 or equivalent 
(counter and multiplexer designs were studied in Sections 14.2, 14.3, and 11.6, respectively). 

Note: The purpose of the solution presented above is to illustrate the general idea of “nonstatic” 
outputs (that is, outputs that might change in spite of the machine staying in the same state). 
However, for the particular circuit above (PWM), a much simpler solution can be devised because 
it is simply a two-window signal generator (Section 14.4) whose intermediate value is variable 
(depends on x), while the total length is fixed (=7). Therefore, if the FSM approach is not employed, 
the circuit can be designed as in Section 14.4; or, if it is employed, then it can be modeled exactly as 
in Figure 15.24. 
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FIGURE 15.29. Circuit of Example 15.8. OC 
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15.6 Multi-Machine Designs 


In some designs more than one FSM might be needed. Two fundamental aspects related to such designs 
are described in this section: synchronism and reduction to quasi-single machines. 


Synchronism 


When using more than one machine to implement a circuit, it is necessary to determine whether the 
signals that they produce must be synchronized with respect to each other. As an example, suppose that 
a certain design contains two FSMs, which produce the signals y, and y depicted in Figure 15.30(a) that 
ANDed give rise to the desired signal (y=y, - yz) shown in the last plot of the figure. If y; and y, are not 
synchronized properly, a completely different waveform can result for y, like that in Figure 15.30(b), in 
spite of the shapes of y, and yy remaining exactly the same. 

Several situations must indeed be considered. In Figure 15.31(a), y, and y, exhibit different frequencies 
(because T,=4T) and T,=5T 9, where To is the clock frequency), so synchronism might not be important 
(observe, however, that only when the least common multiple of T, and T> is equal to T;-T, is synchronism 
completely out of question). The second situation, depicted in Figure 15.31(b), shows y, and y, with the 
same frequency and operating at the same clock edge (positive, in this example); this means that it is 
indeed just one machine, which produces both outputs, so sync is not an issue. Finally, Figure 15.31(c) 
shows the two machines operating with the same frequency but at different clock transitions (positive 
edge for y;, negative edge for y>); in this type of situation, sync is normally indispensable. 

Note in Figure 15.31(c) that the ith state of yy (i=A, B, C, D) comes right after the ith state of y, (that is, 
one-half of a clock cycle later). This implies that pr_state2 is simply a copy of pr_state1, but delayed by 
one-half of a clock period. Consequently, the machines can be synchronized using the method shown in 
Figure 15.32; in (a), the general situation in which the machines work independently (no sync) is depicted, 
while (b) shows them synchronized by the interconnection nx_state2=pr_statel. This synchronization 
procedure is illustrated in the example below. 
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y2 y2 


FIGURE 15.30. Different shapes can result for y=y,-y> when y, and y, are not synchronized properly. 
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FIGURE 15.31. Designs employing two FSMs operating (a) with different frequencies (sync might not be 
important); (b) with the same frequency and at the same clock edge (therefore a single machine, so sync is not 
an issue); and (c) with the same frequency but at different clock transitions (sync normally required). 
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FIGURE 15.32. (a) Two FSMs operating independently (no sync); (b) Machines synchronized by the inter- 


connection nx_state2=pr_state1; (c) Reduction to a quasi-single machine due to the interconnection 
pr_state2=pr_state1. 


MM EXAMPLE 15.9 CIRCUIT WITH SYNCHRONIZED MACHINES 


Using the FSM approach, design a circuit that is capable of generating the signals depicted in 
Figure 15.31(c). 


SOLUTION 


The state transition diagrams of both machines are shown in Figure 15.33(a). Applying the 5-step 
design procedure of Section 15.2, the circuit of Figure 15.33(b) results (it is left to the reader to verify 
that). Observe the interconnection nx_state2 =pr_state1, which is responsible for the synchronism. 
Note also that in this case both outputs (y, and y,) are subject to glitches, so Step 5 of the design 
procedure must be included if glitch-free signals are required. 


h 


pr_state2 


2 2 

3 & 

a 8 

' \ 

6 é 
gz 
Mis. 2 
— 38 
wo 
x! 

(a) (b) 


FIGURE 15.33. Two synchronized FSMs (see nx_state2=pr_state1 interconnection in (b)) that implement the 
waveforms of Figure 15.31(c). Both outputs (y,, y>) are subject to glitches, so Step 5 of the design procedure 
is necessary if glitch-free signals are required (see also Excercise 15.21). Oo 
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FIGURE 15.34. (a) and (b) Waveforms constructed by combining the signals '0', '1', c/k, and clk’ side by side; 
(c) Corresponding generic circuit; (d) Equivalent circuit implemented with a quasi-single machine. 


From multi-machines to quasi-single machine 


When the machines have the same number of states, they can be reduced to a quasi-single machine, as 
shown in Figure 15.32(c). This is achieved by including Step 5 of the disign procedure (Section 15.2) in 
the second FSM (see register at its output) and then making the connection pr_state2 =pr_state1, which 
eliminates the sequential section of the second FSM. Observe, however, that this technique is not always 
advantageous (see Exercise 15.20; see also Exercises 15.15, 15.16, and 15.21). 


15.7 Generic Signal Generator Design Technique 


As seen in previous examples, the study of signal generators offers an invaluable opportunity for mas- 
tering not only FSM design techniques, but digital logic concepts in general. In this section, another 
interesting design technique is presented, which can be applied to the construction of any binary 
waveform. 

Such a technique is derived from the observation that any binary waveform can be constructed with 
the proper combination of the signals '0', '1', clk, and clk’, placed side by side. Two examples are depicted 
in Figure 15.34. The waveform in Figure 15.34(a) has period T=2T) and can be constructed with the 
sequence {clk — '1'} in each period of y. The example in Figure 15.34(b) is more complex; its period is 
T=7T, and it can be constructed with the sequence {clk — '0' > clk’ > '1' > clk > clk’ = clk}. 

The generic circuit (capable of implementing any binary waveform) introduced here is depicted 
in Figure 15.34(c). It consists of two multiplexers, controlled by two FSMs that operate at different 
clock edges. The pair FSM1-MUXz1 is responsible for generating the desired waveform, while the pair 
FSM2-MUX2 is responsible for removing any glitches (recall that glitches are never allowed in signal 
generators). 

The operation of MUX2 is as follows. During the time intervals in which the output of MUX] (called x) 
exhibits no glitch, MUX2 must select x. On the other hand, if a glitch occurs when x is supposed to stay high, 
then '1' must be selected to eliminate it; similarly, if a glitch occurs when x is supposed to stay low, then '0' 
must be selected to suppress it. The use of this technique is illustrated in the example that follows. 

Finally, note that the FSMs of Figure 15.34(c) can be reduced to a quasi-single machine using the 
technique described in Section 15.6. The resulting circuit is shown in Figure 15.34(d) (see Exercises 15.15 
and 15.16). 
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MM EXAMPLE 15.10 GENERIC SIGNAL GENERATOR DESIGN TECHNIQUE 


Design the signal generator of Figure 15.35, which must derive, from clk, the signal y. Note that in 
this case the output has transitions at both clock edges, so the circuit’s resolution must be maximum 
(that is, one-half of a clock period), in which case the use of two FSMs might be helpful. 


Signal 


clk generator 


FIGURE 15.35. Signal generator of Example 15.10. 


SOLUTION 


As shown in Figure 15.35, each period of y can be constructed with the sequence {'l' > clk — '1'}. 
A tentative solution, with only one FSM-MUxX pair, is shown in Figure 15.36(a). The multiplexer’s 
only inputs are c/k and '1', because these are the signals needed to construct y. To generate the 
sequence above, the FSM must produce sel='1' (to select the '1' input), followed by sel ='0' (to select 
clk), and finally sel ='1' (to select '1' again). 

There is a major problem, though, with this circuit. Observe in Figure 15.36(b) that when the mul- 
tiplexer completes the transition from its upper input (clk) to its lower input ('1'), clk is already low 
(some propagation delay in the mux is inevitable), thus causing a glitch at the output. Consequently, 
for this approach to be viable, some kind of glitch removal technique must be included. 

The problem can be solved with the addition of another FSM-MUxX pair, as previously shown 
in Figure 15.34(c). Since the glitch in x occurs when x is high, 'l' must be selected during that time 
interval, with x chosen at all other times. 

The generic circuit of Figure 15.34(c), adapted to the present example, is shown in Figure 15.37(a) 
(note that not all input signals are needed). A corresponding timing diagram is shown in 
Figure 15.37(b). MUX1 must generate x, which requires the sequence {'1' > clk > '1'},so sell ={'1' > 
'0' > '1'} must be produced by FSM1. The selection performed by MUX2 depends on the glitch 
locations; in this example, there is only one glitch, which occurs while x='1', so MUX2 must select 
its lower input ('1') during that time slot (so sel2='1'), and should select x (so sel2='1') during 
the other two time slots that comprise each period of y. In summary, MUX2 must generate the 
sequence {x > '1' > x}, so sel2={'0' > '1' — '0'} must be produced by FSM2. 
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FIGURE 15.36. (a) FSM-MUxX-based solution (not OK) for the signal generator of Figure 15.35; (b) Glitch 
formation during the c/k-to-'1' state transition. 
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FIGURE 15.37. Example using the generic 2-FSM-MUX approach for the design of signal generators: 
(a) Diagram for generating the signal of Figure 15.35; (b) Corresponding timing diagram showing a glitch 
in x; (c) Values of se/1 and se/2 with respective FSM state names; (d) State transition diagrams of FSM1 and 
FSM2; (e) Final circuit (multiplexers not shown). (See also Exercise 15.21.) 


The timing diagram is repeated in Figure 15.37(c), showing the names for the states (A, B, C) to 
be used in the FSMs. From this diagram, the state transition diagrams of both FSMs can be obtained 
and are displayed in Figure 15.37(d). Applying the 5-step design procedure of Section 15.2 to these 
two diagrams, the expressions below result. 

For FSM1: d,=49, dg=q1' * qo, and sell =q,) 
For FSM2: nx_state2 =pr_statel (for sync) and sel2 =q 


With these expressions, the final circuit can be drawn (shown in Figure 15.37(e)). Notice the use 
of the synchronization method described in Section 15.6. The multiplexers to which se/1 and sel2 will 
be connected can be constructed using any of the techniques described in Section 11.6. 


Note: For a continuation of this generic design technique for signal generators see Exercises 15.15 
and 15.16 and also Section 23.2. 


15.8 Design of Symmetric-Phase 
Frequency Dividers 


A signal generator is a circuit that takes the clock as the main input and from it produces a predefined 
glitch-free signal at the output (Section 14.4). As seen in Section 14.5, a frequency divider is a particu- 
lar case of signal generator in which the output is simply a two-window signal with the windows 
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FIGURE 15.38. Divide-by-5 circuit with symmetric phase: (a) Waveforms; (b) General circuit diagram. 


sometimes required to exhibit equal length (50% duty cycle). If the clock frequency must be divided 
by M, with M even, then the circuit needs to operate only at one of the clock edges (resolution= one 
clock cycle, Section 15.3); on the other hand, if M is odd, the circuit must operate at both clock transi- 
tions (resolution = one-half clock cycle), thus requiring a slightly more complex solution. 

The design of this type of circuit using the FSM approach is very simple. The problem is illustrated in 
Figure 15.38 for M=5. In Figure 15.38(a), the waveforms are shown. Because M =5, five states (A, B, ...) 
are required. The circuit must produce the waveform x,, which is '0' during (M—1)/2 clock periods and 
'l' during (M+1)/2 clock cycles. A copy of x, (called x5), delayed by one-half clock period, must also be 
produced such that ANDing these two signals results in the desired waveform (y=x,-X,). Note that y 
has symmetric phase, and its frequency is f.,/M. 

The circuit is depicted in Figure 15.38(b). It contains a positive-edge FSM, which produces x,, followed 
by an output stage that stores x, at the negative transition of the clock to produce x,. An additional 
(optional) flip-flop was also included, which is needed only when x, is subject to glitches. The example 
below illustrates the use of this design technique. 


MM EXAMPLE 15.11 DIVIDE-BY-5 WITH SYMMETRIC PHASE 
Using the FSM approach, design a circuit that divides f.,. by 5 and produces an output with 50% 
duty cycle. 
SOLUTION 


Our circuit must produce the waveforms depicted in Figure 15.38(a). The FSM is a 5-state machine, 
whose output (x,) must be low during (M—1)/2=2 clock periods and high during (M+ 1)/2=3 clock 
cycles. Hence the corresponding truth table for x, and nx_state is that depicted in Figure 15.39 (with 
sequential binary encoding for the states), from which, with the help of Karnaugh maps, the follow- 
ing equations result: 
X= 92t 4 
do=41"4o 
dy=41°4o' +41" *o 
dy=42' * 40! 

Therefore, the corresponding circuit for this FSM is that within the dark boxes in Figure 15.39. We 
must now make a decision regarding whether its output (x,) is subject to glitches or not. Looking at its 
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FIGURE 15.39. Symmetric-phase divide-by-5 circuit of Example 15.11. 


expression (Xx, =q7+4q,), we verify, for example, that when the system moves from state D (429,9)="011") 
to state E (q2q,q)="100"), all bits change; therefore, if q, goes to '0' before gz has had time to grow to '1'’, 
x, ='0' will occur, which constitutes a glitch because x, was supposed to stay high during that transition 
(see Figure 15.38(a)). Consequently, the extra DFF shown at the output of the FSM is needed. The other 
DFF shown in Figure 15.39, which operates at the negative clock transition, stores x, to produces X, 
hence resulting in y=Xj-X at the output. 

Note: The counter above has only five states (so it requires 3 flip-flops) and is designed such that 
the output stays low during (M—1)/2 clock periods and high during (M+1)/2 periods. Recall the 
approach used to design prescalers in Section 14.6, which requires | M/2 flip-flops to implement a 
divide-by-M circuit. Because M is 5 in the exercise above, 3 DFFs are needed to implement the cor- 
responding prescaler, which is the same number of DFFs needed above (but prescalers are faster). 
Moreover, they produce an output low during (M-—1)/2 clock periods and high during (M+1)/2 
periods automatically. In summary, when designing small symmetric-phase frequency dividers, the 
prescaler approach (for the counter only, of course) should be considered. Mf 


15.9 Finite State Machine Encoding Styles 


As seen in the examples above, to design an FSM we need first to identify and list all its states. This 
“enumeration” process causes the data type used to represent the states to be called an enumerated type. 
To encode it, several schemes are available, which are described next. The enumerated type color shown 
below (using VHDL syntax), which contains four states, will be used as an example. 


TYPE color IS (red, green, blue, white) 


Sequential binary encoding: In this case, the minimum number of bits is employed, and the states are 
encoded sequentially in the same order in which they are listed. For the type color above, two bits are 
needed, resulting in red ="00" (=0), green="01" (= 1), blue="10" (= 2), and white="11" (= 3). Its advantage 
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State Encoding Style 
Seq. binary Two-hot One-hot 
stated 000 00011 00000001 
state1 001 00101 00000010 
| state2 010 | 01001 00000100 
state3 011 10001 00001000 
state4 100 00110 00010000 
state5 101 01010 00100000 
state6 110 10010 01000000 
state7 111 01100 40000000 


FIGURE 15.40. Some encoding options for an eight-state FSM. 


is that it requires the least number of flip-flops; with N flip-flops (N bits), up to 2‘ states can be encoded. 
The disadvantage is that it requires more combinational logic and so might be slower than the others. 
This is the encoding style employed in all examples above, except for Example 15.4, in which four DFFs 
were employed instead of three (that system has only seven states). 

One-hot encoding: At the other extreme is the one-hot encoding style, which uses one flip-flop per state 
(so with N flip-flops only N states can be encoded). It demands the largest number of flip-flops but the 
least amount of combinational logic, being therefore the fastest. The total amount of hardware, however, 
is normally larger (or even much larger, if the number of states to be encoded is large) than that in the 
previous option. For the type color above the encoding would be red="0001", green ="0010", blue="0100", 
and white ="1000". 

Two-hot encoding: This style is in between the two styles above. It presents 2 bits active per state. Then 
with N flip-flops (N bits), up to N(N-1)/2 states can be encoded. For the type color above the encoding 
would be red="0011", green ="0101", blue="1001", and white ="0110". 

Gray encoding: Values are encoded sequentially using Gray code (Section 2.3). For the type color above 
the encoding would be red="00", green="01", blue="11", and white="10". The amount of hardware and 
the speed are comparable to the sequential binary option. 

User-defined encoding: This includes any other encoding scheme chosen by the designer (as in Example 15.4). 

The one-hot style might be used in applications where flip-flops are abundant, like in FPGAs (field 
programmable gate arrays, Chapter 18), while in compact ASIC (application-specific integrated circuit) 
implementations the sequential binary style is often chosen. As an example, suppose that our FSM has 
eight states. Then the encoding for three of the options listed above would be that shown in Figure 15.40. 
The number of flip-flops required in each case is three for sequential binary, five for two-hot, and eight 
for one-hot. In VHDL, there is a special attribute that allows the user to choose any encoding style—it is 
called enum_encoding and will be seen in Section 19.16. 


MM EXAMPLE 15.12 ONE-HOT-ENCODED COUNTER 


A one-hot counter is expected to be simply a circular shift register (Section 14.1) with only one bit 
high. Design a five-state counter using one-hot encoding to verify that fact. 


SOLUTION 


The state transition diagram is depicted in Figure 15.41(a), and the corresponding truth table, 
utilizing one-hot encoding, is shown in Figure 15.41(b). From the truth table we obtain d,=q3, 
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A 
("00001") 


pr_state nx_state 
4493929190 | d4d3d2d;do 


FIGURE 15.41. Five-state counter using one-hot encoding (a circular shift register). 


d3 =o, dy=q,, d;=4o, and dy=qy4, which are the equations for a circular shift register (Figure 15.41(c)). 
However, no initialization scheme is provided in this design (as is the case for any shift register), 
which then needs to be provided separately. This was done in Figure 15.41(c) by connecting the reset 
input to the preset port of the first flip-flop and to the reset port of all the others, thus causing the 
system to start from state A (that is, 949392q,4)="00001"). 


EXAMPLE 15.13 GRAY-ENCODED COUNTER 


This example further illustrates the use of encoding styles for FSMs other than sequential binary. 
Design the five-state counter seen in the previous example, this time encoding its states using Gray 
code instead of one-hot code. 


SOLUTION 


The corresponding state transition diagram is shown in Figure 15.42(a) with the Gray values 
included. Recall, however, from Section 2.3, that Gray codes are MSB reflected (the codewords are 
reflected with respect to the central words and differ only in the MSB position—see example in 
Figure 2.3), so when the number of states is odd the resulting code is not completely Gray, because 
the first and the last codewords differ in two positions instead of one. On the other hand, when the 
number of states is even, a completely Gray code results if the codewords are picked from the code- 
word list starting at the center and moving symmetrically in both directions (try this in Figure 2.3). 

The truth table for the FSM of Figure 15.42(a) is presented in Figure 15.42(b), from which, with the 
help of Karnaugh maps, the following equations are obtained for nx_state: 


do=41 "qo 
dy =9y' 41+ 4o 
d= qn’ 
A circuit that implements these equations is shown in Figure 15.42(c). 
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nx_state 
G2 91 Go d2 di do 
(c)011|) 010 
(D)0 10] (&) 110 
(E)110]| (A) 000 


(b) (c) 


FIGURE 15.42. Gray-encoded counter of Example 15.13. Oo 


15.10 Exercises 


. 0-to-9 counter 
a. Using the FSM approach, design a 0-to-9 counter. Prove that its equations are: 
d3=43°4o' +92°41"4o 
dy=4o* 41’ +92" 40' + 92'* 91°40 
41=491°940 +93'°91' "40 
dy=4o' 
b. Compare the resulting circuit (equations) with that designed in Example 14.3 (Figure 14.10). 
Which one has simpler equations? Why? 


2. Modulo-7 counter 


Example 15.3 shows the design of a synchronous modulo-2° counter. Redesign it, assuming that it 
must now be a modulo-7 counter with the following specification: 


a. Counting from 0 to 6 
b. Counting from 1 to 7 
3. 3-to-9 counter with three flip-flops 


a. Using the FSM approach, design a 3-to-9 counter with the minimum number of flip-flops (that is, 
3, because this machine has 7 states). Remember, however, that the actual output (y) must be 4 bits 
wide to represent all decimal numbers from 3 to 9. To reduce the amount of combinational logic 
in the conversion from 3 to 4 bits, try to find a suitable (3-bit) representation for the FSM states. 
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b. Compare the resulting circuit with that designed in Example 14.6 (Figure 14.13). Note, however, 
that sequential 0-to-6 encoding was employed there, while a different encoding is likely to be 
chosen here. 


4. Counter with gray output 


Redesign the counter of Example 15.3, but instead of producing sequential binary code its output 
should be Gray-encoded (see Example 15.13). 


5. String comparator #1 


Design an FSM that has two bit-serial inputs, a and b, and a bit-serial output, y, whose function is to 
compare a with b, producing y='1' whenever three consecutive bits of a and b are equal (from right 
to left), as depicted in the following example: a="...00110100", b="...01110110", y="...00110000". 


6. String comparator #2 
In Example 15.5 a very simple string detector was designed, which detects the sequence "111". 


a. Can you find a simpler (trivial) solution for that problem? (Suggestion: Think of shift registers 
and logic gates.) Draw the circuit corresponding to your solution then discuss its advantages 
and disadvantages (if any) compared to the FSM-based solution seen in Example 15.5. 


b. Suppose that now a string with 64 ones must be detected instead of 3 ones. Is your solution still 
advantageous? Explain. 


c. Suppose that instead of only ones, the detection involved also zeros (with an arbitrary composi- 
tion). Does this fact affect the approach used in your solution? Explain. 


7. String comparator #3 


Apply the same discussion of Exercise 15.6 to the design of Exercise 15.5. Can you find a simpler 
(non-FSM-based) solution for that problem? In which respect is your solution advantageous? 


8. Circular register 


Figure E15.8 shows the diagram of an FSM controlling four switches. The machine must close one 
switch at a time, in sequence (that is, A, then B, then C, then D, then A, ...), keeping it closed for n 


clock periods. 
eT 
clk 
FIGURE E15.8. 


a. Design an FSM that solves this exercise for n=1. 


b. Can this exercise be solved with a circular shift register (see Figure 14.2(c) and Example 15.12)? 
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c. Repeat the exercise above forn=1in A,n=2 in B,n=3inC, and n=4 in D. 


d. How would your solution be if m were much larger (for example, n= 100)? 


9. Extension of Example 15.6 
The FSM depicted in Figure E15.9 is somewhat similar to that in Example 15.6 but with an input 
variable (x) added to it (it is now a Mealy machine). For the circuit to move from state A to state B, 
not only a certain amount of time (time_up) must pass, but x='1' must also occur. The time count- 
ing should only start after x='1' is received, resetting it if x='0' happens before time_up has been 
completed. A similar condition is defined for the circuit to return from B to A. Design the corre- 
sponding circuit, considering that time_up and time_down are three and five clock periods, respec- 
tively, as in Example 15.6. 
x='0' OR x='1’ AND time_up vet’ OR 
(x='1’ AND (x='0' AND 
NOT time_up) x='0' AND time_down NOT time_down) 

FIGURE E15.9. 

10. Frequency divider with symmetric phase 
Using the FSM approach, design a circuit that divides the clock frequency by M and produces an 
output with 50% duty cycle (see Section 15.8). Design two circuits, as indicated below, then compare 
the results. 
a. For M=6 
b. For M=7 
How would your solution be if M were large, for example 60 or 61 instead of 6 or 7? 

11. Signal generator #1 


Design an FSM capable of deriving, from clk, the waveforms x and y shown in the center of Figure 
E15.11-12. Was Step 5 of the design procedure necessary in your solution? 


a Le ee LT 


Signal 
generator 


clk 


FIGURE E15.11-12. 


12. Signal generator #2 


Design an FSM capable of deriving, from clk, the waveforms x and y shown on the right hand side 
of Figure E15.11-12. Was Step 5 of the design procedure necessary in your solution? 
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13. Signal generator #3 


Design an FSM capable of deriving, from clk, the waveform shown on the left of Figure E15.13-14, 
where To is the clock period. 


4To 4T, 4To 
8To 16To 120To 
< T > . T + 


FIGURE E15.13-14. 


14. Signal generator #4 


a. Using the approach described in Section 15.4 (see Example 15.7), design an FSM capable of 
deriving, from clk, the waveform shown on the right of Figure E15.13-14. 


b. Can you suggest another approach to solve this exercise? 
15. Generic signal generator design #1 


This exercise concerns the generic design technique for arbitrary binary waveform generators 
described in Section 15.7. 


a. Show that a glitch can only occur in x during the following mux transitions: (i) from clk to '1', (ii) 
from clk to clk’, (iii) from clk’ to '0', and (iv) from clk’ to clk. 


b. Employing the approach of Section 15.7, design a circuit that produces the waveform of Figure 
15.34(b). 


c. Applying the technique for reduction to a quasi-single machine described in Section 15.6, 
redraw the circuit obtained above. 


d. Can you suggest another “universal” approach for the design of signal generators (with maxi- 
mum resolution)? 


16. Generic signal generator design #2 


a. Using the technique described in Section 15.7, design a circuit capable of producing the wave- 
form shown in Figure E15.16. Notice that the sequence that comprises this signal is {clk > '1' > 
clk’ — clk’ — '0'}. 

b. Applying the technique for reduction to a quasi-single machine described in Section 15.6, 
redraw the circuit obtained above. 


Signal 


clk generator y y 


FIGURE E15.16. 
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17. Car alarm 


Utilizing the FSM approach, design a circuit for a car alarm. As indicated in Figure E15.17(a), it 
should have four inputs, called clk, rst, sensors, and remote, and one output, called siren. For the 
FSM, there should be at least three states, called disarmed, armed, and intrusion, as illustrated in 
Figure E15.17(b). If remote ='1' occurs, the system must change from disarmed to armed or vice versa 
depending on its current state. If armed, it must change to intrusion when sensors ='1' happens, thus 
activating the siren (siren ='1'). To disarm it, another remote ='1' command is needed. 


Note: Observe that this machine, as depicted in Figure E15.17(b), exhibits a major flaw because it 
does not require remote to go to '0' before being valid again. For example, when the system changes 
from disarmed to armed, it starts flipping back and forth between these two states if the command 
remote ='1' lasts several clock cycles. 


Suggestion: The machine of Figure E15.17(b) can be fixed by introducing intermediate (tempo- 
rary) states in which the system waits until remote ='0' occurs. Another solution is to use some kind 
of flag that monitors the signal remote to make sure that only after it goes through zero a new state 
transition is allowed to occur. 


Hint: After solving this exercise, see Section 23.3. 


remote=1 


remote 
sensors disarmed armed sensors=1 intrusion 
FSM siren (siren=0) (siren=0) (siren=1) 
rst 
ck—)> 


remote=1 


(a) (b) 


FIGURE E15.17. 


18. Garage door controller 


Design a controller for an electric garage door, which, as indicated in Figure E15.18, should have, 
besides clock and reset, four other inputs: remote (='1' when the remote control is activated), open 
(='1' when the door is completely open, provided by a sensor), closed (='1' when the door is 
completely closed, also provided by a sensor), and timer (='1' 30 s after open='1' occurs). At the 
output, the following signals must be produced: power (when '1' turns the electric motor on) and 
direction (when '0' the motor rotates in the direction to open the door, when '1' in the direction to 
close it). 


The system should present the following features: 
i. If the remote is pressed while the door is closed, immediately turn the motor on to open it. 
ii. If the remote is pressed while the door is open, immediately turn the motor on to close it. 


iii. If the remote is pressed while the door is opening or closing, immediately stop it. If pressed 
again, the remote should cause the door to go in the opposite direction. 


iv. The door should not remain open for more than a certain amount of time (for example, 30s); 
this information is provided by an external timer. 
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a. Given the specifications above, would any glitches (during state transitions) be a problem for 
this system? 


b. Estimate the number of flip-flops necessary to implement this circuit. Does the clock frequency 
affect this number? Why? 


c. Design this circuit using the formal FSM design technique described in Section 15.2 (assume a 
reasonable frequency for clk). 


Note: See the observation in Exercise 15.17 about how to avoid the effect of long remote='1' 
commands. 


remote 
open 
closed power 
timer FSM 
direction 
rst 
clk —> 


FIGURE E15.18. 


19. Switch debouncer 


When we press or change the position of a mechanical switch, bounces are expected to occur before 
the switch finally settles in the desired position. For that reason, any mechanical switch must be 
debounced in an actual design. This can be done by simply counting a minimum number of clock 
cycles to guarantee that the switch has been in the same state for at least a certain amount of time (for 
example, 5 milliseconds). In this exercise, the following debouncing criteria should be adopted: 


Switch closed (y='0'): x must stay low for at least 5 ms without interruption. 
Switch open (y='1'): x must stay high for at least 5 ms without interruption. 


a. Assuming that a clock with frequency 1kHz is available, design an FSM capable of debouncing the 
switch of Figure E15.19. However, before starting, estimate the number of DFFs that will be needed. 


b. How many DFFs would be needed if the clock frequency were 1 MHz? 
Note: In Chapter 23 (Exercise 23.3) you will be asked to solve this problem using VHDL. Check then 
if your predictions for the number of flip-flops made here were correct. 


Vop=3.3V 


x 
7 


FIGURE E15.19. 
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20. Quasi-single FSM #1 


Compare the two architectures for multi-machine designs described in Section 15.6 (Figures 15.32(b) 
and (c)). Is the reduction to a quasi-single machine always advantageous? (Hint: Think of an FSM 
with few states but a large number of output bits). 


21. Quasi-single FSM #2 


a. Employing the quasi-single-machine reduction procedure described in Section 15.6, redraw the 
circuit of Example 15.9 (Figure 15.33) 


b. Was it advantageous in this case? 


c. Would it be advantageous in the case of Example 15.10? 


15.11 Exercises with VHDL 


See Chapter 23, Section 23.5. 


Volatile Memories 


Objective: Virtually any digital system requires some kind of memory, so understanding how 
memories are built, their main features, and how they work is indispensable. Volatile memories are 
described in this chapter, while nonvolatile ones are described in the next. The following volatile types 
are included below: SRAM (regular, DDR, and QDR), DRAM, SDRAM (regular, DDR, DDR2, and 
DDR3), and CAM. 


Chapter Contents 


16.1 Memory Types 

16.2 Static Random Access Memory (SRAM) 

16.3 Dual and Quad Data Rate (DDR, QDR) SRAMs 

16.4 Dynamic Random Access Memory (DRAM) 

16.5 Synchronous DRAM (SDRAM) 

16.6 Dual Data Rate (DDR, DDR2, DDR3) SDRAMs 

16.7 Content-Addressable Memory (CAM) for Cache Memories 
16.8 Exercises 


16.1. Memory Types 


The enormous need in modern applications for large and efficient solid-state memory has driven MOS 
technology to new standards, so a discussion on MOS technology would not be complete without the 
inclusion of memories. 

Memories are normally classified according to their data-retention capability as volatile and nonvolatile. 
Volatile memory is also called RAM (random access memory), while nonvolatile is also (for historic rea- 
sons) called ROM (read only memory). Within each of these two groups, further classification exists, as 
shown in the lists below. Some promising next-generation memories are also included in this list and will 
be seen in Chapter 17. 

Volatile memories (Chapter 16): 


SRAM (static RAM) 

DDR and QDR (dual and quad data rate) SRAM 
DRAM (dynamic RAM) 

SDRAM (synchronous DRAM) 

DDR/DDR2/DDR3 SDRAM (dual data rate SDRAM) 


CAM (content-addressable memory) for cache memories 
433 
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Nonvolatile memories (Chapter 17): 

mg MP-ROM (mask-programmed ROM) 

mg OTP-ROM (one-time-programmable ROM, also called PROM) 
mg EPROM (electrically programmable ROM) 

m EEPROM (electrically erasable programmable ROM) 

m Flash (also called flash EEPROM) 

Next generation memories (Chapter 17): 

mg FRAM (ferroelectric RAM) 

mg MRAM (magnetoresistive RAM) 

m PRAM (phase-change RAM) 


16.2 Static Random Access Memory (SRAM) 


SRAM is one of the most traditional memory types. Its cells are fast, simple to fabricate, and retain data 
without the need for refresh as long as the power is not turned off. A popular SRAM application is in the 
construction of computer cache memory (shown in Section 16.7). 


SRAM circuit 


Acommon SRAM cell implementation is the so-called 6T (six transistor) cell depicted in Figure 16.1(a). It 
consists of two cross-coupled CMOS inverters plus two access nMOS transistors responsible for connect- 
ing the inverters to the input/output bit lines (BLs) when the corresponding word line (WL) is asserted. 
A3x3 (3 words of 3 bits each) SRAM array is illustrated in Figure 16.1(b). 


Memory-read 


One of the methods for reading the SRAM cell of Figure 16.1(a) consists of first precharging both BLs 
to Vpp, which are then left floating. Next, WL is asserted, causing one of the BLs to be pulled down. This 
procedure is illustrated in Figure 16.2, which shows q='0' and q' ='1' as the current contents of the SRAM 
cell. The latter causes M1 to be ON and M2 to be OFF, while the former causes M3 to be OFF and M4 to 
be ON. After bit and bit’ have been both precharged to '1' and left floating, the access transistors, M5 and 
M6, are turned ON by the respective WL. Because M3 is OFF, the BL corresponding to bit’ will remain 
high, while the other will be pulled down (because M1 is ON), hence resulting in bit='0' and bit’ ='1' at 
the output. 

A major concern in the memory-read procedure described above is the large parasitic capacitance 
of the (long) bit lines. When M5 is turned ON (recall that M1 is already ON), the high voltage of the 
bit line causes the voltage on the intermediate node q (=0V) to be pulled up momentarily. The value 
of this AV depends on the relative values of the channel resistances of M1 and M5. Even though 
the voltage on node q will return to zero eventually, it should be prevented from going above the 
threshold voltage of M3 (~0.5V), or, at least, it should stay well below the transition voltage (VzR, 
Equation 9.7) of the M3-M4 inverter, otherwise M3 might be turned ON, thus reverting (corrupting) 
the stored bits. 
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bit bit’ 


word line 
(WL) 


bit lines (BLs) 


(a) 


FIGURE 16.1. (a) Traditional 6T SRAM cell; (b) 3x3 SRAM array. 


bit=Voo bit’=Voo 


wi=S L 


FIGURE 16.2. One option for reading the 6T SRAM cell. 


This protection, called read stability, is achieved with M1 stronger (that is, with a larger channel width- 
to-length ratio) than M5. A minimum-size transistor can then be used for M5 (and M6), while a wider 
channel (typically around twice the width of M5) is employed for M1 (and M3). 

Another concern in this memory-read procedure regards the time needed for M1-M5 to pull the 
respective BL down, which, due to the large parasitic capacitance, tends to be high. To speed this process 
up, a sense amplifier (described later) is usually employed at each bit line. 

Another memory-read alternative is to precharge the BLs to Vpp/2 instead of Vpp, which prevents data 
corruption even if the transistors are not sized properly, and it has the additional advantage of causing 
a smaller swing to the BL voltages, thus reducing the power consumption and improving speed (at the 
expense of noise margin). 


Memory-write 


The memory-write procedure is illustrated in Figure 16.3(a). Suppose that the SRAM cell contains q='0', 
which we want to overwrite with a '1'. First, bit='1' and bit' = '0' are applied to the corresponding BLs, 
then WL is pulsed high. Due to the read stability constraint, M5 might not be able to turn the voltage on 
node q high enough to reverse the stored bits, so this must be accomplished by M6. This signifies that 
M6, in spite of M4 being ON, should be able to lower the voltage of q' sufficiently to turn M1 OFF (that is, 
below M1’s threshold voltage, ~0.5 V), or, at least, well below the transition voltage (V7) of the M1-M2 
inverter. For this to happen, the channel resistance of M6 must be smaller than that of M4 (that is, M6 
should be stronger than M4). Because M6 is nMOS and M4 is pMOS (which by itself guarantees a factor 
of about 2.5 to 3 due to the higher charge mobility of nMOS transistors—Sections 9.1 and 9.2), same-size 
transistors suffice to attain the proper protection. In summary, M2, M4, M5, and M6 can all be minimum- 
size transistors, while M1 and M3 must be larger. 
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FIGURE 16.3. (a) SRAM memory-write procedure; (b) SRAM memory-write circuitry. 


Finally, note that there are several ways of applying the '0' and '1' voltages to the BLs in Figure 16.3(a), 
with one alternative depicted in Figure 16.3(b). Both BLs are precharged to Vpp, then left floating; next, 
a write-enable (WE) pulse is applied to the upper pair of nMOS transistors, causing one of the BLs to be 
lowered to '0' while the other remains at '1’. 


SRAM chip architecture 


A typical architecture for an SRAM chip is depicted in Figure 16.4. In this example, the size of the array 
is 16x8x4, meaning that it contains 16 rows and 8 columns of 4-bit words (see the diagram on the right- 
hand side of Figure 16.4). Two address decoders (Section 11.5) are needed to select the proper row and 
the proper column. Because there are 16 rows and 8 columns, 4 (A,-A;) plus 3 (A,—A,) address bits are 
required, respectively. As can be seen in the figure, the four data I/O pins (D,—-D3) are bidirectional and 
are connected to input and output tri-state buffers, which are controlled by the signals CE (chip enable), 
WE (write enable), and OE (output enable). 

To write data into the memory, in='1' must be produced, which occurs when CE and WE are both 
low. To read data, out='l' is required, which happens when CE and OE are low and WE is high. When 
CE is high, the chip is powered down (this saves energy by lowering the internal supply voltage and by 
disabling the decoders and sense amplifiers). Data, of course, remains intact while in the standby (power 
down) mode. 


Sense amplifier 


To speed up SRAM memory accesses, a sense amplifier is used at each bit line. An example is depicted in 
Figure 16.5, which contains two sections, that is, an equalizer and the sense amplifier proper. 

As described earlier, the memory-read procedure starts with a precharge phase in which both BLs are 
precharged to Vpp. That is the purpose of the equalizer shown in the upper part of Figure 16.5. When 
equalize’ ='0', all three pMOS transistors are turned ON, raising and equalizing the voltages on all BLs. To 
increase speed, it is not necessary to wait for the BLs to be completely precharged; however, it is indis- 
pensable that they be precharged to exactly the same voltage, which is achieved by the horizontal pMOS 
transistor. When the BLs have been precharged and equalized, they are allowed to float (equalize' ='1'). 
Next, one of the WLs is selected, so each SRAM cell in that row will pull one of the BLs down. After the 
voltage difference between bit and bit' has reached a certain value (for example, 0.5 V), sense is asserted, 
powering the cross-coupled inverters that constitute the sense amplifier. The side with the higher 
voltage will turn the nMOS transistor of the opposite inverter ON, which will then help pull its BLdown 


16.2 Static Random Access Memory 


Input buffer 


437 


Ao—>| § 

o 
Ai 8 N Memory array 
A2 > 17 16x8x4 
As © 


Column decoder a 


column 1 column 2 ove column 8 
Dy ow? 7? Ge ---- ie 
>| ow? GS Sn ---- 
> ois SS SS ---- Ss 


FIGURE 16.4. Typical SRAM chip 


architecture. In this example, the array size is 16 rows by 8 columns of 


4-bit words (see the diagram on the right). Two address decoders, 7 address bits, and 4 I/O lines are needed. 
Memory-write occurs when WE=CE='0', while memory-read happens when O£=CE='0' and WE='1'. CE='1' 
causes the chip to be powered down. 
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FIGURE 16.5. Differential sense amplifier (plus equalizer) for SRAM memories. 


and will also turn the opposite pMOS transistor ON, hence establishing a positive-feedback loop that 
causes one of the BL voltages to rapidly decrease toward '0' while the other remains at '1'. Also, as men- 


tioned earlier (in the description 


of the memory-read procedure), the BLs can be precharged to Vpp/2 


instead of Vpp, which has the additional benefits already described (at the expense of noise margin). 
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16.3 Dual and Quad Data Rate (DDR, QDR) SRAMs 


As seen above, conventional SRAM chips (Figure 16.4) are asynchronous. To improve and optimize 
operation, modern SRAM architectures are synchronous, so all inputs and outputs are registered and all 
operations are controlled directly by the system clock or by a clock derived from it (hence synchronized 
to it). Moreover, they allow DDR (dual data rate) operation, which consists of processing data (that is, 
reading or writing) at both clock transitions, hence doubling the throughput. 

An extension of DDR is QDR (quad data rate), which is achieved with the use of two independent 
data buses, one for data in (memory-write) and the other for data out (memory-read), both operating in 
DDR mode. 

Dual-bus operation is based on dual-port cells, which are derived directly from the 6T SRAM cell seen 
in Figure 16.1(a), with two options depicted in Figure 16.6. In Figure 16.6(a), the “write” part of the cell 
is shown, which utilizes the left bit line to enter the bit value to be written into the cell. In Figure 16.6(b), 
the “read” part is shown, with the read bit output through the right bit line (bit*). In Figure 16.6(c), 
these two parts are put together to produce the complete dual-port cell. Another alternative is shown in 
Figure 16.6(d), which is simpler but requires more word and bit lines and also exhibits a poor isolation 
between the bita/bita’ and bitb/bitb’ bit lines. 

A simplified diagram for a QDR SRAM chip is shown in Figure 16.7, which shows two data buses 
(data_in and data_out) plus an address bus, all of which are registered. Two clocks are shown, called K 
and C; the former is for writing, while the second is for reading. R and W are read and write control 
signals, respectively. In this example, the memory size is 72 Mbits, distributed in 2M rows, each with 
a single 36-bit word. As in the SDRAMs described ahead, the operation of QDR SRAMs is based on 
synchronous pipelined bursts. 

To conclude the study of SRAMs, a summary of typical specifications for large, modern, single-die, 
standalone QDR SRAM chips is presented below. 


m@ Density: 72 Mbits 
m Bit organization: rows and columns 


mg Maximum operating frequency: 400 MHz 


bit bit* 
. bita bitb bita’ bitb’ 
write 
read WLa 


bit* 


{J 
we +L on 


FIGURE 16.6. Dual-port SRAM cell. In (a) and (b), the “write” and “read” parts of the 6T cell are shown, 
respectively, which are put together in (c) to produce the complete dual-port cell. A simpler dual-port cell is 
depicted in (d), but it requires more word and bit lines, and the bit lines are poorly isolated. 


(b) 
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FIGURE 16.7. Simplified QDR SRAM chip diagram. 


m Maximum data rate: 800 Mbps in+ 800 Mbps out per line 
m@ Burst length: 1, 2, 4, or 8 

m@ Supply voltage: 1.8V 

gm I/O type: HSTL-18 (Section 10.9) 


16.4 Dynamic Random Access Memory (DRAM) 


Like SRAMs, DRAMs are also volatile. However, contrary to SRAMs, they must be refreshed periodi- 
cally (every few milliseconds) because information is stored onto very small capacitors. They are also 
slower than SRAMs. On the other hand, DRAM cells are very small and inexpensive, hence allowing the 
construction of very dense low-cost memory arrays. A very popular application of DRAMs is as main 
(bulk) computer memory. 


DRAM circuit 


The most popular DRAM cell is the 1T-1C (one transistor plus one capacitor) cell, depicted in the 3 x3 
array of Figure 16.8(a). The construction of the capacitor is a very specialized task; examples are trench 
capacitor (constructed vertically) and stacked capacitor (multilayer). A simplified diagram illustrating the 
construction of the 1T-1C cell using a trench capacitor is presented in Figure 16.8(b). 


Memory-read 


To read data from this memory, first the BL is precharged to Vpp/2 then is left floating. Next, the prop- 
er WL is asserted, causing the voltage of BL to be raised (if a '1' is stored) or lowered (when the cell 
contains a '0’). 

As with the SRAM cell, a major concern in this procedure is the large parasitic capacitance of the 
(long) bit line, Cg, (illustrated at the bottom-left corner of Figure 16.8(a)). Because a single transistor must 
pull BL down or up and its source is not connected to ground, but rather to the cell capacitor, C,,1, the 
voltage variation AV that it will cause on BL depends on the relative values of these two capacitors, 
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FIGURE 16.8. (a) 3x3 DRAM array employing 1T-1C cells; (b) Simplified cell view with trench capacitor. 


that is AV=(C.on/ (Cpt Coen) Vpp/2. Because C..1<<Cpz, (for compactness), AV can be very small, hence 
reducing the noise margin and making the use of sense amplifiers indispensable. Note that the reading 
procedure alters the charge stored in the DRAM cells, so the sense amplifiers’ outputs must be rewritten 
into the memory array. 


Memory-write 

To write data into the DRAM cell of Figure 16.8(a) the bit values must be applied to the bit lines, then 
the proper word line is pulsed high, causing the capacitors to be charged (if bit='1') or discharged (if 
bit ='0'). Note that, due to the threshold voltage (V7) of the nMOS transistor, C,,. is charged to Vpp— Vr 
instead of Vpp. 


cell 


DRAM chip architecture 


An example of DRAM chip architecture is depicted in Figure 16.9. In this case, the array size is 256 rows 
by 256 columns of 4-bit words. All inputs are registered, while the outputs are not. The row and column 
addresses (a total of 16 bits) are multiplexed. The memory array is controlled by the signals (1)—(6) gen- 
erated by the timing and control unit after processing OE (output enable), WE (write enable), RAS (row 
address strobe), and CAS (column address strobe). As can be seen, (1)-(3) serve to store the row address, 
the column address, and the input data, respectively, while (4) and (5) control the input and output 
tri-state buffers. (6) is used to power the chip down (standby mode) to save energy (this is normally 
achieved by lowering the internal supply voltage and by disabling the decoders and sense amplifiers). 
As in the SRAM chip, data are obviously maintained while in standby mode. 

The write /read sequences for the device of Figure 16.9 can be summarized as follows. To write data, 
the row address is presented, then RAS is lowered to store that address. Next, the column address is 
presented, and CAS is lowered to store it. Finally, WE is lowered to latch the input data and store it into 
the memory. To read data, the same address storage sequence is needed, followed by OE ='0', which 
activates the output buffers. 
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FIGURE 16.9. Simplified view of a conventional DRAM chip. In this example, the array size is 256 rows by 256 
columns of 4-bit words. All inputs are registered, while the outputs are not. The row and column addresses 
(total of 16 bits) are multiplexed. The memory array is controlled by the signals (1)-(6) generated by the 
timing and control unit after processing OF (output enable), WE (write enable), RAS (row address strobe), and 
CAS (column address strobe). 


DRAM refresh 


One important difference between DRAMs and SRAMs is the need for refresh in the former, which 
makes them more complex to operate. To refresh the memory, several schemes are normally available, 
among which is the CAS-before-RAS procedure, which consists of lowering CAS before RAS is lowered. 
With CAS low, every time RAS is pulsed low one row is refreshed. In this case, the external addresses are 
ignored and the row addresses (needed for refresh) are generated internally by a counter in the timing 
and control unit. This sequence of pulses must obey the maximum delay allowed between refreshes. For 
example, if the maximum delay is 16 ms, then the minimum frequency for the RAS pulses in the example 
of Figure 16.9 is 256 / 16ms=16 kHz. 


Sense amplifier 


As with SRAMs, many sense amplifiers have been developed for DRAM memories. An example 
(called open bit-line architecture) is shown in Figure 16.10, which is an extension of that used in 
Figure 16.5. Because DRAMs are single-ended, while SRAMs exhibit differential outputs, to employ 
the differential circuit described previously each bit line is broken into two halves, as indicated in 
Figure 16.10(a), where for simplicity only one column is displayed. This column was redrawn horizon- 
tally in Figure 16.10(b), with the sense amplifier inserted in the center, plus two dummy DRAM cells, 
one on each side of the amplifier. A memory-read process starts with equalize ='1', which precharges 
and equalizes the voltages of BL1 and BL2 to Vpp/2. During this time, dummy1 and dummy? are also 
asserted, charging their respective capacitors also to Vpp/2. Next, the BLs and the dummies are left 


442 CHAPTER 16 Volatile Memories 


Vov/2 equalize Voo/2 


WL 1 
Sense —_» 
amplifier 
Amplifier 
insertion dummy1 WL WL WL = dummy2 
BL1 BL2 
WL n-2 
WL n-1 


oS 
(a) (b) 


FIGURE 16.10. Differential sense amplifier (called open bit-line architecture) for DRAM arrays. 


floating, and the proper WL is asserted. When a WL on the left is selected, dummy2 is asserted as 
well, maintaining BL2 at Vpp/2, which acts as a reference voltage for the differential sense amplifier. 
The voltage of BL1 will necessarily grow or decrease, depending on whether the selected cell con- 
tains a'l' or a '0', respectively. After the differential voltage between BL1 and BL2 reaches a certain 
value (for example, 0.2 V), sense is asserted, activating the sense amplifier, and dummy2 is unasserted. 
The amplifier causes BL1 to rapidly move toward '1' or '0', while BL2 moves in the opposite direc- 
tion, thus producing bit=BL1 and bit’=BL2 at the output. Note that when a WL on the right of the 
sense amplifier is selected, dummy] is used to produce the reference voltage, in which case bit = BL2 
and bit’=BL1. 


16.5 Synchronous DRAM (SDRAM) 


Next we describe several improvements that have been incorporated onto traditional DRAM chips. These 
improvements, however, affect only their operation because the storage cell still is the 1T-1C cell seen in 
Figure 16.8. The following architectures are covered below: SDRAM, DDR SDRAM, DDR2 SDRAM, and 
DDR3 SDRAM. 

Instead of conventional DRAMs, modern main computer memories are constructed with synchronous 
DRAMs (SDRAMs). While the former requires a separate controller, the whole system is built in the 
same chip in the latter. By eliminating the need for communication between separate chips, a speed 
closer to that of the CPU is achieved. 

A simplified diagram for an SDRAM chip is depicted in Figure 16.11. As can be seen, the inputs and 
outputs are all registered, and the overall operation is controlled by an external clock derived from 
(hence synchronized to) the system clock. The main inputs are address and data, while the only output is 
data (bidirectional). There are also clock (CLK) and clock enable (CKE) inputs, plus five control inputs, 
that is, CS (chip select), WE (write enable), RAS (row address strobe), CAS (column address strobe), and 
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FIGURE 16.11. Simplified SDRAM chip diagram. 


DQM (data mask). As before, RAS and CAS store row and column addresses, respectively, at the positive 
clock edge; WE enables the memory-write operation; and CS enables or disables the device (when high, 
it masks all inputs except CLK, CKE, and DQM). The purpose of CKE, when low, is to freeze the clock, 
to power down the chip (standby mode), and also to activate the self-refresh mode (described below). 
Finally, DQM serves to mask the input and output data when active (the inputs are masked and the out- 
put buffers are put in high-impedance mode). 

Comparing the diagram of Figure 16.11 to that of a traditional DRAM (Figure 16.9), several differ- 
ences are observed. First, to cope with the large memory size, a three-dimensional addressing scheme 
is employed, that is, besides row and column addresses, a block address is also included. In this 
example, the total memory size is 1 Gbits, divided into 4 blocks, each with 8192 rows by 8192 columns 
of 4-bit words. Another difference is the presence of CLK and CKE, which are needed because this 
DRAM is synchronous. As mentioned above, when CKE='1' the memory operates as usual, while 
CKE='0' freezes the clock, powers the chip down (standby mode), and also activates the self-refresh 
scheme. 

Self-refresh is another important feature of SDRAMs, which present two refresh schemes, called auto- 
refresh and self-refresh. The former is similar to the CAS-before-RAS mode described for regular DRAMs, 
while the latter is even simpler because no external clocking is needed. When CKE is lowered, the timing 
and control unit generates not only the addresses, but also the clock signal (with the appropriate mini- 
mum refresh period) needed to refresh all SDRAM locations. The minimum refresh period is typically 
in the 10ms—100ms range. Suppose that for the SDRAM of Figure 16.11 it is 64ms. Then, taking into 
account that all blocks and columns are refreshed simultaneously (one row at a time, of course), only row 
addresses need to be generated. Because there are 8192 rows, the minimum frequency for the internal 
refresh clock is 8192/64 ms = 128 kHz. 

Another important feature of SDRAMs is that they are burst oriented, that is, a memory-read 
command causes several sequential (or interleaved) data words to be read with the burst length pro- 
grammable between 1 and 8. There is, however, a clock latency (CL) penalty associated with every 
memory-read command (due to pipelined operation). When such a command is given, the controller 
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must wait CL clock cycles (CLis measured in clock cycles, and it ranges typically between 2 and 5) before 
the first data-out word is valid. However, the succeeding words are retrieved at full clock speed. 

The SDRAM architecture described above served as the basis for the creation of the so-called dual 
data rate (DDR) versions of SDRAM, which allow faster memory access and lower power consump- 
tion. Therefore, even though SDRAM chips, as originally conceived, are now of little interest, the 
versions derived from it are used as bulk (main) memory in basically any computer (desktop or 
laptop) as well as in basically any other computing device. Such SDRAM derivations are described 
below. 


16.6 Dual Data Rate (DDR, DDR2, DDR3) SDRAMs 


As mentioned above, dual data rate (DDR) is a variation of the SDRAM architecture described in the 
previous section. DDR, DDR2, and DDR3 memories transfer data at both clock transitions (positive 
and negative clock edges), thus achieving twice the throughput for the same clock frequency. They are 
used as main memory in personal computers, where they operate over a 64-bit data bus. Due to their 
compactness, low cost, and relatively high performance, they are used as main memory in all sorts of 
electronic gadgets. 

DDR is a perfected SRAM memory, operating with a power supply of 2.5 V, against 3.3 V of regu- 
lar SDRAM. The specified minimum and maximum clock frequencies are 100 MHz and 200 MHz. 
A 2-bit prefetch is employed so data can be transferred at both edges, achieving 200 to 400 MTps 
(Tps=transfers per second). Because each transfer contains 64 bits (8 bytes), the total data rate is 
1.6 to 3.2 GBps. The type of I/O used to access this memory is SSTL_2 (studied in Section 10.9—see 
Figures 10.29 and 10.32). 

All SDRAM memories for computer applications are available in the form of modules. A module is 
a small printed circuit board on which several SDRAM chips are installed, creating a larger memory 
array, which can be plugged directly into receptacles available on the motherboard. Such modules are 
known by the acronym DIMM (dual inline memory module). For DDRs intended for desktop PCs, the 
DIMM contains 184 pins, like that shown in Figure 16.12. The capacity of such modules normally fall in 
the 256 MB-2 GB range. 

DDR2 differs from DDR in its internal construction, allowing twice the throughput. It is operated 
with two clocks, one for the memory, another for the I/O bus. The memory clock still ranges from 100 
to 200 MHz, as in DDR, but now the prefetch depth is 4 bits instead of 2. The I/O clock operates at twice 
the speed, so because data is transferred at both of its edges, 400 to 800 MTps occur. Because each transfer 
again contains 64 bits (8 bytes), the total data rate is 3.2 to 6.4GBps. 


DDR SDRAM chip 


184-pin DIMM 
for desktop PCs 


FIGURE 16.12. DIMM (dual inline memory module) with DDR SDRAM chips for desktop computers. 
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A comparison between DDR, DDR2, and DDR3 is shown in Figure 16.13. A major improvement in 
DDR2 with respect to DDR is ODT (on-die termination), which consists of bringing the termination 
resistors used for impedance matching (to avoid reflected signals in the PCB traces, which are treat- 
ed as transmission lines) from the PCB into the die. Such resistors, of values 500, 750, and 1500, 
are selectively connected between the data pins and Vpp or GND to optimize high-frequency I/O 
operation. Another important difference is the lower power consumption of DDR2 (the power sup- 
ply was reduced from 2.5 V to 1.8V). The I/O standard used in DDR2 is SSTL_18 (Section 10.9—see 
Figures 10.29 and 10.33). 

Typical DDR2 chip densities are in the 256 Mb-2 Gb range. The corresponding DIMM, for the case of 
desktop PCs, contains 240 pins, with a total memory generally in the 512 MB-4GB range. Denser DIMMs 
for other applications also exist. 

The most recent addition to the SDRAM family is DDR3. As shown in Figure 16.13, it too operates 
with two clocks, one for the memory, the other for the I/O bus. The memory clock still ranges from 100 to 
200 MHz, as in DDR and DDR2, but now the prefetch depth is 8 bits, and the I/O clock frequency is four 
times that of the memory clock. Because data are transferred at both I/O clock edges, 800 to 1600 MTps 
can occur. Because each transfer again contains 64 bits (8 bytes), the total data rate is 6.4 to 12.8 GBps. 
Note in Figure 16.13 that DDR3 employs a power supply of 1.5 V, which causes a power consumption 
about 30% lower than DDR2. The I/O standard used in DDR3 is SSTL_15 (Section 10.9). 

DDR3 chip densities are expected to range typically from 512 Mb to 4Gb. The corresponding DIMM, 
for the case of desktop PCs, contains again 240 pins, with the total memory expected to be typically in 
the 512 MB-8 GBrange. 
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¢ 1/0 clock 100 MHz 200 MHz 400 MHz 
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FIGURE 16.13. Main features of DDR, DDR2, and DDR3 SDRAM memories. 
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16.7 Content-Addressable Memory (CAM) 
for Cache Memories 


The last volatile memory to be described is CAM (content-addressable memory). To read a traditional 
memory, an address is presented, to which the memory responds with the contents stored in that 
address. The opposite occurs in a CAM: instead of an address, a content is presented, to which the mem- 
ory responds with a hit if such a content is stored in the memory, or with a miss if it is not. This type of 
memory, also called associative memory, is used in applications where performing a “match” operation is 
necessary. 

A simplified view of a CAM application is shown in Figure 16.14. It consists of a regular computer 
memory structure, where the CPU operates with a cache memory, followed by the main memory, and 
finally a hard disk. The cache is constructed in the same chip as the CPU, so they can communicate at 
full clock speed. The main memory, which is of DRAM type, is constructed in separate chips, so it com- 
municates with the CPU at a much lower speed than the cache. The hard disk is even slower, so it is the 
most undesirable data location for high-speed programs. 

As can be seen, the cache memory consists of two main blocks (besides cache control), that is, CAM 
and SRAM. The former stores tags, while the latter stores data. The tags are (part of) the addresses of the 
main memory whose contents are available also in the cache (GRAM). In other words, the CAM stores 
addresses while the SRAM stores the corresponding data. 

In this example we assume that the cache is fully associative. In simplified terms, its operation 
is as follows. To retrieve data from the memory system, the CPU sends an address out through the 
address bus. The CAM compares the received address against all addresses (tags) stored in it (in a single 
iteration), responding with hit='1' when a match is found or with hit='0' (a miss) otherwise. When a hit 
occurs, the match line of the matched tag asserts the corresponding word line (WL) in the SRAM, caus- 
ing its content to be placed on the data bus (hit='1' is necessary to inform that is a valid data word). 
When a miss occurs, the CPU retrieves the data from the main memory, in which case the data are copied 
also into the cache for future use. 

Figures 16.15(a) and (b) show two CAM cell implementations, called 9T (nine-transistor) CAM 
and 10T CAM, respectively. The upper part of either cell is a conventional 6T SRAM (Figure 16.1(a)), 
while the other transistors constitute the match circuitry. If the input bit does not coincide with 
that stored in the SRAM, the match line (ML) is pulled down. Therefore, for a ML to remain 
high, all incoming bits must match all bits stored in the CAM cells that belong to the same row. 
This computation is illustrated in Figure 16.15(c), where pseudo-nMOS logic (Section 10.7) was 
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FIGURE 16.14. Simplified view of a CAM application, as part of a fully associative cache memory in a regular 
computer memory structure. 
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FIGURE 16.15. (a) 9T CAM cell; (b) 10T CAM cell; (c) Generation of “hit” with pseudo-nMOS logic. 


employed. If all MLs are pulled down, then hit ='0' (a miss) is produced by the final OR gate, while 
hit ='1' occurs when a match is found. Notice that the CAM match lines (MLs) are word lines (WLs) 
for the SRAM array. 


16.8 Exercises 
1. SRAM array 
Consider the SRAM architecture depicted in Figure 16.4. 
a. How many transistors are needed to construct its 16 x 8x4 SRAM core? 


b. Explain how the address decoders work (see Section 11.5). How many output bits does each 
decoder have? How many bits are active at a time? 


2. SSR/QDR SRAM array 


a. If, instead of SRAM cells, DDR SRAM cells were employed in the 16 x8 x4 core of Figure 16.4, 
how many transistors would be needed? 


b. And if QDR SRAM cells were used instead? 
3. QDR SRAM chip 


In Figure 16.7 a simplified diagram for a QDR SRAM chip was presented. Additionally, typical 
parameters for this kind of chip were included at the end of Section 16.3. 


a. Check memory data sheets for the largest chips of this type that you can find. What is its bit 
capacity? 
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b. Make a simplified diagram for it (as in Figure 16.7). 
c. Check its speed, supply voltage, and type of I/O. 
4. SRAM versus DRAM 
a. Briefly describe the main conceptual differences between SRAM and DRAM. 
b. Physically speaking, how is a bit stored in an SRAM cell, and how is it stored ina DRAM cell? 


c. If the 16x8x4 core of Figure 16.4 were constructed with DRAM cells, how many transistors 
(and capacitors) would be needed? 


d. Which is used as main computer memory and which is used as cache memory? Why? 
5. DRAM versus SDRAM 

a. Briefly describe the main conceptual differences between DRAM and SDRAM. 

b. Are they conceptually intended for the same applications? 
6. DDR SDRAM versus regular SDRAM 


a. Briefly describe the technical evolutions that occurred in the transition from regular SDRAM to 
DDR SDRAM. 


b. Check in data sheets the main parameters for DDR SDRAM chips and compare them against 
those listed in the DDR column of Figure 16.13. 


7. DDR SDRAM modules 
As explained in Section 16.6, DDRs are normally assembled in DIMMs. 


a. Check in data sheets for the number of pins and general appearance of DIMMs intended for 
desktop computers. 


b. Look for the largest (in bytes) DIMM that you can find for the application above. 
c. Repeat parts (a) and (b) for DIMMs intended for laptop computers. 
8. DDR2 versus DDR SDRAM 


a. Briefly describe the technical evolutions that occurred in the transition from DDR SDRAM to 
DDR2 SDRAM. 


b. Check in data sheets the main parameters for DDR2 SDRAM chips and compare them against 
those listed in the DDR2 column of Figure 16.13. 


9. DDR2 SDRAM modules 
As explained in Section 16.6, DDR2 memories are normally assembled in DIMMs. 


a. Check in data sheets (or JEDEC standards) for the number of pins and general appearance of 
DDR2 DIMMs intended for desktop computers. 


b. Look for the largest (in bytes) DDR2 DIMM that you can find for the application above. 
c. Repeat parts (a) and (b) for DDR2 DIMMs intended for laptop computers. 
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10. 


11. 


12. 


13. 


DDR3 versus DDR2 SDRAM 


a. Briefly describe the technical evolutions that occurred in the transition from DDR2 SDRAM to 
DDR3 SDRAM. 


b. Check in data sheets the main parameters for DDR3 SDRAM chips and compare them against 
those listed in the DDR3 column of Figure 16.13. 


DDR3 SDRAM modules 
As explained in Section 16.6, DDR3 memories are normally assembled in DIMMs. 


a. Check in data sheets (or JEDEC standards) for the number of pins and general appearance of 
DDR3 DIMMs intended for desktop computers. 


b. Look for the largest (in bytes) DDR3 DIMM already available for the application above. 
c. Repeat parts (a) and (b) for DDR3 DIMMs intended for laptop computers. 

CAM array 

a. Check the operation of both CAM cells shown in Figure 16.15. 


b. How many transistors are needed to construct a CAM array similar to that in Figure 16.15(c) 
with 256 32-bit rows using the CAM cell of Figure 16.15(a)? 


CACHE memory 


Briefly explain the operation of the cache memory shown in Figure 16.14. Why is CAM (Figure 16.15) 
needed to construct it? 


This page intentionally left blank 


Nonvolatile Memories 1 7 


Objective: The study of memories was initiated in Chapter 16, where volatile types were described, 
and it concludes in this chapter, where nonvolatile memories are presented. The following nonvolatile 
types are included below: MP-ROM, OTP-ROM, EPROM, EEPROM, and flash memory. A special section 
on promising next-generation nonvolatile memories is also included, where FRAM, MRAM, and PRAM 
memories are described. 


Chapter Contents 


17.1 Memory Types 

17.2 Mask-Programmed Read Only Memory (MP-ROM) 
17.3. One-Time-Programmable ROM (OTP-ROM) 

17.4 Electrically Programmable ROM (EPROM) 

17.5 Electrically Erasable Programmable ROM (EEPROM) 
17.6 Flash Memory 

17.7 Next-Generation Nonvolatile Memories 

17.8 Exercises 


17.1 Memory Types 


As mentioned in Section 16.1, nearly all modern digital designs require memory, which can be divided 
into volatile and nonvolatile. The former type was seen in Chapter 16, while the latter is studied here. The 
complete list is shown below, where some promising next-generation technologies are also included. 


Volatile memories (Chapter 16): 

SRAM (static RAM) 

DDR and QDR (dual and quad data rate) SRAM 
DRAM (dynamic RAM) 

SDRAM (synchronous DRAM) 

DDR/DDR2/DDR3 SDRAM (dual data rate SDRAM) 


CAM (content-addressable memory) for cache memories 


Nonvolatile memories (Chapter 17): 
m MP-ROM (mask-programmed ROM) 
mg OTP-ROM (one-time-programmable ROM, also called PROM) 
451 
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mg EPROM (electrically programmable ROM) 

m EEPROM (electrically erasable programmable ROM) 
m Flash (also called flash EEPROM) 

Next generation memories (Chapter 17): 

mg FRAM (ferroelectric RAM) 

m MRAM (magnetoresistive RAM) 

m PRAM (phase-change RAM) 


17.2 Mask-Programmed ROM (MP-ROM) 


The first type of nonvolatile memory to be described is MP-ROM (mask-programmed read only memory), 
which is programmed during fabrication. A 4 x 3 (4 words of 3 bits each) ROM of this type is depicted in 
Figure 17.1(a). This is a NOR-type ROM because each column is a NOR gate (a pseudo-nMOS architec- 
ture was employed in this example—see Figure 10.12(b)). Considering the outputs after the inverters, the 
presence of a transistor corresponds to a '1', while its absence represents a '0'. A common construction 
approach for this type of ROM is to have a transistor fabricated at every node, then have the final inter- 
connect mask select which ones should effectively participate in the circuit. 

To select a word in Figure 17.1(a), a '1' (Vpp), is applied to the corresponding word line (WL) with all 
the other WLs at 0V. Suppose that WLO has been selected; then the voltages of bit lines BL2 and BL1 are 
lowered by the nMOS transistors, while that of BLO remains high due to the pull-up pMOS transistor. 
Therefore, after inverters, the output is "110". The four words stored in this NOR-type MP-ROM are listed 
in Figure 17.1(c). Note that to convert the N address bits into 2‘ word lines an address decoder (Section 11.5) 
is needed, which is not included in Figure 17.1(a). 

A logically equivalent ROM implementation is shown in Figure 17.1(b). This is a NAND-type MP-ROM 
because each column is a NAND gate (see Figure 10.12(a)). Word selection is done with a '0', while keeping 
all the other WLs at '1'. If we again consider the outputs after the inverters, now the presence of a transistor 
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FIGURE 17.1. (a) 4x3 NOR-type pseudo-nMOS MP-ROM array; (b) Equivalent NAND-type pseudo-nMOS 
MP-ROM array; (c) Memory contents; (d) Implementation using conventional gates. 
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is a '0', while its absence is a '1'. The advantage of this approach is that the nMOS transistors are directly 
connected to each other, with no ground or any other contact in between, thus saving silicon space. On the 
other hand, the large time constant associated with the long stack of transistors causes the NAND memory 
to be slower to access than its NOR counterpart. 

When using CPLDs (Chapter 18), ROMs are often implemented with traditional gates. An example is 
shown in Figure 17.1(d), which produces the same output words as the ROMs of Figures 17.1(a) and (b), 
listed in Figure 17.1(c). 


17.3  One-Time-Programmable ROM (OTP-ROM) 


Old construction techniques for OTP-ROMs are based on fuses or antifuses, which either open or 
create contacts when traversed by a relatively large electric current, hence allowing transistors/gates 
to be removed or inserted into the circuit. However, due to its lower power consumption and mature 
technology, EPROM cells (described below) are currently preferred. In this case, the EPROM array 
is conditioned in a package that does not contain the transparent window (for erasure) of regular 
EPROMs. OTP-ROMs are compact and present low cost, at the expense of versatility, because they 
cannot be reprogrammed. 


17.4 Electrically Programmable ROM (EPROM) 


Nearly all commercial reprogrammable ROMs are based on floating-gate transistors. The first electrically 
programmable ROM (EPROM) employed the FAMOS (floating-gate avalanche-injection MOS) transistor, 
depicted in Figure 17.2(a). Compared to a conventional MOSFET (Figure 9.2), it presents an additional 
gate surrounded by insulating material (SiO,) and with no connections to the external circuit. For that 
reason, this gate is called floating gate, while the regular gate is referred to as control gate. 

If a large positive voltage is applied to the control gate (12 V) and also to the drain (6V), with the 
source grounded, high-energy electrons start flowing between the source and drain terminals. Some of 
these electrons might acquire enough kinetic energy such that, after scattering in the crystal lattice and 
being accelerated by the transversal electric field (due to the gate voltage), they are able to traverse the 
thin (<100nm) oxide layer that separates the transistor channel from the floating gate. Once they reach 
the floating gate, they are trapped and remain there indefinitely (if no strong opposing electric field 
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FIGURE 17.2. (a) Cross section of a FAMOS transistor (for EPROM cells), which is programmed by avalanche 
injection and erased by UV light; (b) NOR-type pseudo-nMOS-based EPROM array. 
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is applied, of course). Such electrons are referred to as “hot” electrons, and this phenomenon is called 
channel hot-electron injection (CHEI). 

The presence of electrons in the floating gate raises the threshold voltage (V7) from around 1 V (with 
no charge) to over 6 V. Therefore, given that all logic voltages are =5V, such a transistor can never be 
turned ON. A transistor without charge stored in the floating gate is said to be erased (this is considered 
a 'l'), while a transistor with electrons accumulated in the floating gate is said to be programmed 
(considered to be a '0'). 

The floating-gate transistor, with the additional gate included in its symbol, is used to implement an 
EPROM array in Figure 17.2(b). Like the circuit in Figure 17.1(a), this too is nonvolatile, asynchronous, 
NOR-type, and employs pseudo-nMOS logic (Section 10.7). Note that in the EPROM cells a single tran- 
sistor is used for storage and for word selection. 

To program the EPROM cells, the whole array must first be erased, which is done with an EPROM 
programmer, where the memory is exposed to UV radiation through a transparent window existent 
in its package during several minutes (the time depends on light intensity, distance, etc.). Next, each 
row (word) can be individually programmable. To do so, first the values to be stored are applied to 
the bit lines (BLs), then the corresponding word line (WL) is pulsed high. Recall that a programmed 
transistor represents a '0', while an erased transistor is a '1'. For example, to write "0 1 1" to the first row 
of Figure 17.2(b), the following voltages are needed: BL2=6V, BL1=BLO=0V, WLO=WL1=WL2=0V; 
after applying these voltages, WLO must be pulsed high (12 V) during a few microseconds, causing the 
leftmost transistor to be programmed ('0'), while the other two remain erased ('1'). 

Most EPROMs are fully static and asynchronous (as in the example in Figure 17.2(b)), so to read its 
contents only addresses and the proper chip-enable signals are needed. After the address is processed by 
the address decoder, one of the WLs is selected (WL='1'), while all the others remain low. If a transistor in 
the selected row is programmed (that is, has charge in its floating gate, so Vy is high), it will remain OFF, 
thus not affecting the corresponding BL (stays high). On the other hand, if it is erased (without charge, so 
Vy is low), it will be turned ON, hence lowering the corresponding BL voltage. Therefore, after inverters, 
the outputs will be '0' for the FAMOS cells that are programmed and '1' for those that are erased. 

The main limitation of EPROMs is the erasure procedure. As mentioned above, to remove charge from 
the floating gates the memory array must be exposed to UV light (erasure occurs because the UV radiation 
generates electron-hole pairs in the insulator, causing it to become slightly conductive) for several minutes, 
a process that is time consuming and cumbersome (off system). Another important limitation is their 
endurance, normally limited to about 100 erasure cycles. These limitations, however, do not apply when 
the EPROM is intended for OTP (one-time-programmable) applications (Section 17.3), which is basically 
the only case where the EPROM cell is still popular. 

Asummary of typical features of standalone EPROM chips is presented below. 


m Density: 128 Mb 

Architecture: Normally fully-static asynchronous, but synchronous is also available. 
Erase time: Several minutes (whole array) 

Program (write) time: 5 ws/word 

Access (read) time: 45ns 


Supply voltages: 5 V down to 2.7V 


Programming voltages: 12V/6V 
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m Endurance (erase-program cycles): 100 


m@ Current main application: OTP-ROMs 


17.5 Electrically Erasable Programmable ROM 
(EEPROM) 


EEPROM solves the erasure problem of EPROM with a slight modification in the floating-gate transistor. 
The new device, called FLOTOX (floating-gate tunneling-oxide) transistor, is depicted in Figure 17.3(a). 
Comparing it to a traditional floating-gate transistor (Figure 17.2(a)), we observe that the floating gate 
now has a section running over and very near (~10nm) the drain. With such a thin oxide, and under the 
proper electric field, electrons can traverse the oxide, thus moving from the drain to the floating gate and 
vice versa by a mechanism called Fowler-Nordheim tunneling. Therefore, to program the device, a high 
voltage (12 V) must be applied to the control gate with the drain grounded, which causes electrons to 
tunnel from the drain to the floating gate, where they remain indefinitely if no other strong electric field 
is ever applied. The tunneling effect is bidirectional, so if the voltages above are reversed the electrons 
will tunnel back to the drain, hence erasing the transistor. 

The ideal goal with the FLOTOX transistor was the construction of memory arrays with single-transistor 
cells that could be individually erased and programmed (thus avoiding the lengthy and nonversatile erasure 
procedure of EPROMs). Such a construction, however, is not possible due to two major limitations of the 
FLOTOX cell, the first related to the program-erase procedure and the second concerning the difficult control 
over the threshold voltage. 

The problem with the program-erase procedure is that in EEPROMs it is completely symmetric, that is, 
the same process (tunneling) and the same cell section (floating-gate/drain overlap) are used for program- 
ming and for erasing. Therefore, it is difficult to write a bit into a cell without disturbing neighboring cells. 

The second problem mentioned above concerns the difficult control over the FLOTOX transistor’s thresh- 
old voltage, which is determined by the amount of charge accumulated in the floating gate. This occurs 
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FIGURE 17.3. Cross section of a FLOTOX transistor (for EEPROM cells), which is programmed and erased by 
means of tunneling between drain and floating gate; (b) NOR-type pseudo-nMOS-based EEPROM array. 
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because tunneling is not a self-limiting process (contrary to tunneling, avalanche injection, which happens 
in the FAMOS transistor of EPROMs, is self-limiting, due to the growing opposing electric field caused 
by the accumulated charge). As a result, the gate might even become positively charged (over-erased), in 
which case the device can never be turned OFF (because V; is then negative, like in a depletion MOSFET). 
Therefore, individual transistor control is desirable to guarantee the proper value of V-. 

Both problems described above are solved by an extra (cell-select) transistor included in the EEPROM 
cell, shown in the array of Figure 17.3(b) (Vpp is a programming voltage). The extra transistor allows each 
cell to be individually accessed and controlled, thus rendering a completely versatile circuit. Moreover, 
erasure is now much faster (a few milliseconds per word or page) than for EPROMs (several minutes). 
The obvious drawback is the much larger size of the EEPROM cell compared to EPROM. 

A final remark concerns the memory-write procedure. Like EPROM, the programming procedure 
starts with a cell-erase cycle, followed by a cell-program cycle. Again, an erased transistor (without 
charge in the floating gate, hence a low V;) represents a '1', whereas a programmed transistor (with 
electrons trapped in the floating gate, hence a high V;) is considered a '0'. Therefore, the cell must be 
programmed only when a '0' needs to be stored in it. In practice, this procedure (erase-program) is con- 
ducted using a train of pulses, with the proper values and applied to the proper terminals, during which 
the cells are carefully monitored until the desired value of V; results. 

To conclude, a summary of typical features of standalone EEPROM chips is presented below. 


Architecture: Parallel (older) and serial (newer) 

Density: 1 Mb parallel, 256 Mb serial 

Write (erase plus program) time: 10ms/page parallel, 1.5ms/word serial 
Access (read) time: 70ns to 200ns parallel, 1 us serial 

Supply voltages: 1.8V to5V 


Endurance (erase-program cycles): 10° to 10° 


Data retention: >10 years 


Note: EEPROM has been almost entirely replaced with Flash EEPROM (described next). 


17.6 Flash Memory 


Flash memory (also called Flash EEPROM) is a combination of EPROM with EEPROM. It requires 
only one transistor per cell (like EPROM), which is electrically erasable and electrically programmable 
(like EEPROM). Its high density and low cost has made it the preferred choice when reprogrammable 
nonvolatile storage is needed. 


ETOX cell 


The original transistor for flash memories is the ETOX (EPROM tunnel oxide) transistor, introduced 
by Intel in 1984 and with many generations developed since then. Two ETOX versions (for 180nm 
and 130nm technologies) are depicted in Figure 17.4. Both operate with avalanche (hot-electron) injec- 
tion for programming (as in EPROM) and tunneling for erasure (as in EEPROM). To avoid the extra 
(cell-access) transistor of EEPROM, in-bulk erasure is performed, which consists of erasing the whole 
chip or, more commonly, a whole block or sector at once (thus the name flash). Moreover, contrary to 
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FIGURE 17.4. (a) and (b) Simplified cross sections of two ETOX transistors, for 180nm and 130nm technolo- 
gies, respectively, both erased by tunneling and programmed by hot-electron injection; (c) NOR-type flash 
array; (d) Situation during erasure (with ETOX of (a)); (e) Situation during programming. 


EEPROM, a nonsymmetric procedure is used for programming-erasing, which eases the control over 
the device in its several operating modes (erase, program, and read). 

The main difference between the ETOX transistors of Figures 17.4(a) and (b) resides in their tunneling 
(erasure) mechanism. While in Figure 17.4(a) tunneling occurs only through the overlap between the 
floating gate and the source diffusion, in Figure 17.4(b) it occurs over the whole channel, resulting in a 
faster cell-erase operation. The graded source doping shown in Figure 17.4(a) is to prevent band-to-band 
tunneling. In Figure 17.4(b), additional details regarding the wells were included. 


ETOX programming 


To program a flash array, it must first be erased. The overall process is illustrated in Figures 17.4(c)-(e), 
where the ETOX transistor of Figure 17.4(a) was employed to construct a NOR-type array (only a 2x2 
section is shown). Erasure is depicted in Figure 17.4(d). It can be accomplished either with negative 
pulses (—12 V) applied to all gates (WLs) with the sources (SL) grounded, or with positive pulses applied 
to the sources (SL) with the gates (WLs) grounded, always with the drains (BLs) open, hence forcing 
electrons to tunnel out of the floating gate into the source (indicated by “tunnel-out” in the figure). At 
the conclusion of erasure, all flash cells contain a '1'. 

Programming is illustrated in Figure 17.4(e), where the bits '0' and '1' are written into the first array 
row. To the first BL a positive voltage (6 V) is applied (to write a '0'), while the second BL receives 0 V (to 
keep the cell erased). The selected WL is then pulsed high (12 V), with the other WLs at 0 V. These pulses 
cause channel hot electrons (avalanche) injection onto the floating gate of the first transistor of the first 
row (indicated by “CHEI” in the figure). Note that the second cell of the first row, which we want to keep 
erased, is only subject to an (very weak) inward tunneling, which only causes a negligible variation of 
that transistor’s V7. Observe also that the cells in the unselected rows are not affected by the operations 
in the selected row. 
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Note: As depicted in the descriptions of all floating-gate devices, to erase and program them a rather 
complex sequence of voltages and pulses is needed, which depends on particular construction details, 
varying with device generation and sometimes also from one manufacturer to another. For these 
reasons, the voltage values and sequences described in this chapter, though closely related to actual 
setups, are just illustrative. 


Split-gate cell 

The flash cells of Figure 17.4 are very compact, but are also very complex to operate, particularly with respect 
to attaining the correct values for V7. Consequently, standalone flash-memory chips often have a built-in 
microprocessor just to control their erase-program procedures. Even though this is fine in standalone memory 
chips, which are very large, it is not adequate for (much smaller) embedded applications. 

A popular alternative for embedded flash is the split-gate cell shown in Figure 17.5(a). As can be seen, 
the floating gate only covers part of the channel, while the control channel covers the complete channel. 
This is equivalent to having two transistors in series to control the gate, one with a floating gate, the other 
just a regular MOS transistor. The disadvantage of this cell is its size, which is bigger than ETOX, being 
therefore not appropriate for large (standalone) memories. On the other hand, it avoids the over-erasure 
problem, simplifying the erase-program procedure and therefore eliminating the need for a built-in con- 
troller (so the cell oversize is justifiable in embedded applications). 

As seen earlier, over-erasure can occur because tunneling is not a self-limiting process, so the gate can 
become positively charged (V;<0 V), in which case the device can never be turned OFF. Because the cell 
of Figure 17.5(a) contains two transistors, when a '0' is applied to the control gate it automatically closes 
the channel, so the cell is turned OFF even if its floating-gate part has been over-erased. In other words, 
over-erasure is no longer a concern, so the memory-write process is faster and simpler. 


SONOS cell 


Another modern flash cell, used in standalone flash memories and in some embedded applications, is 
shown in Figure 7.5(b). This cell, called SONOS (silicon-oxide-nitride-oxide-silicon), has no floating gate, 
which simplifies the fabrication process (single poly). 

Instead of a floating gate, it contains a nitride (Si,;N,) layer. An important characteristic of nitride is its 
richness of defects that act as charge traps; therefore, once the electrons reach that layer, they are trapped 
and remain there indefinitely (as in a floating gate). Another advantage of this cell is that it can be 
programmed with less power (because it can operate using only tunneling, which uses smaller currents) 
and also smaller voltages. This technology has been applied also to the split-gate architecture described 
above for embedded applications. 
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FIGURE 17.5. (a) Split-gate cell for embedded flash memories; (b) SONOS cell. 
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NOR flash versus NAND flash 


Contrary to the previous nonvolatile memory architectures, which are generally NOR-type, flash memories 
are popular in both NOR and NAND configurations. Both architectures are depicted in Figure 17.6, with 
a NOR (transistors in parallel) flash shown in Figure 17.6(a) and a NAND (transistors in series) flash in 
Figure 17.6(b). In the latter, each stack of transistors is called a NAND module. 

Both arrays present the traditional word lines (WLs), bit lines (BLs), and source line (SL), to which the 
proper voltages are applied during the erase, program, and read operations. In the NAND flash, however, two 
additional selection lines can be observed, called selD (select drain) and selS (select source), needed because 
the BLs are shared by many NAND modules. Such modules are normally constructed with 16 transistors each, 
so the transistor per bit relationships are 1T/1bit for NOR and 18T/16bits for NAND. 

To construct a NOR flash, any of the cells seen above can be employed, where a combination of CHEI 
and tunneling is normally employed. On the other hand, for NAND flash, only tunneling is generally 
used, which lowers the power consumption and allows larger memory blocks to be processed at a time. 
In these cells, the erasing procedure (intentionally) continues until the cells become over-erased (that is, 
until excessive electrons are removed from the floating gates, which then become positively charged), 
hence turning the threshold voltage negative (as in a depletion MOSFET), with a final value of V; around 
—2.5V. Programming consists of replacing charge until V; becomes again positive at around 0.7V. As a 
result, transistors with V;=-2.5V will never be turned OFF, while those with V;=0.7V will be turned 
ON when a'1' is applied to their gates. 

To read the NAND flash of Figure 17.6(b), WL='l' must be supplied to all rows except for the 
selected one, to which WL='0' is applied. WL='1' causes all transistors in the unselected rows to be 
short-circuited, hence leaving to the selected-row transistor the decision on whether to lower the BL 
voltage or not. If the transistor is programmed (V;~0.7V), it will remain OFF (recall that WL=0V), 
thus not affecting the corresponding BL. On the other hand, if the transistor is erased (V~>=—2.5V), 
it will be ON anyway, regardless of WL, thus lowering the corresponding BL voltage uncondition- 
ally. After output inverters, the logic levels become '0' or '1', respectively (thus again a programmed 
transistor represents a '0', while an erased one—over-erased indeed—represents a '1'). This procedure 
is performed with selD and selS asserted. 

One advantage of NAND flash over NOR flash is its lower cost due to the fact that the transistors in 
the NAND modules are directly connected to each other, without any contact in between, which reduces 
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FIGURE 17.6. NOR-type flash (transistors in parallel); (b) NAND-type flash (transistors in series). 
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the silicon area substantially (~ 40%). Another advantage is its faster memory-write time (because erasure 
in NAND flash is faster). On the other hand, reading is slower because of the elevated time constant 
associated with the long stack of transistors. These features make NOR and NAND flashes more like 
complementary technologies than competing technologies because they aim at distinct applications. 
For example, program code in computer applications might not change much, but fast read is crucial, so 
NOR flash is appropriate. On the other hand, in video and audio applications, which normally require 
large blocks of data to be stored and frequently renewed, NAND flash is a more suitable candidate (the 
interest in NAND flash has grown immensely recently). Another fast-growing application for NAND 
flash is as a substitute for hard disks. 

One last comment, which refers to flash as well as to EEPROM cells, regards their endurance and 
data retention capabilities. The endurance, measured in erase-program cycles, is in the 10°-10° range. 
Though not all causes that contribute to limit the number of cycles are fully understood, the main 
contributors seem to be defects in the thin oxide and at the Si-SiO, interface, which behave as electron 
traps, as well as defects in the interpoly (between gates) oxide, which behave as hole traps. The trapped 
particles reduce the floating gate’s capacity to collect electrons, thus limiting the changes of V7. Data 
retention, measured in years (typically >10 years), is limited mainly by defects in the thin oxide, which 
cause charge leakage. 


Multibit flash 


To increase data density, 2-bit flash cells are also available. Two approaches have been used, one 
called multilevel cell (MLC, Figure 17.7(a)) and the other called multibit cell (MBC, Figure 17.7(b)). The 
MIC cell is essentially the same ETOX cell seen in Figure 17.4(a), just with the capacity of handling 
distinct amounts of charge (detected by more elaborate sense amplifiers), hence providing more than 
two voltage levels. As indicated in the figure, there are three levels of programming ("00", "01", "10"), 
and one level of erasure ("11"). This is only possible because of advances in fabrication processes, 
which allow finer control over the amounts of charge injected onto or removed from the floating 
gate. The ONO (oxide-nitride-oxide) layer shown in Figure 17.7(a) refers to an intermediate layer 
of Si3N4, which has a high dielectric constant (7.8) to prevent electrons from the floating gate from 
reaching the control gate. 

The second cell (MBC) is similar to the SONOS cell seen in Figure 17.5(b), where instead of a regular 
(conductive) floating gate a trap-based floating gate (nitride) is used. In this case, the electrons remain 
in the region of the floating gate where they were captured. Therefore, even though the cell is perfectly 
symmetric, the left and right sides can behave differently. Note that the source and drain terminals 
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FIGURE 17.7. Two-bit flash cells: (a) Multilevel cell (MLC); (b) Multibit cell (MBC). 
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are interchanged during operation. Both cells of Figure 17.7 operate with tunneling for erasure and 
generally with CHEI for programming. 


Typical flash specifications 
To conclude, a summary of typical features of standalone flash-memory chips is presented below. 


m@ Cell architectures: 1bit/cell and 2bits /cell 
m Array architectures: NOR, NAND, serial 

m Endurance (erase-program cycles): 10° to 10° 
m Data retention: >10 years 

m@ Technology: 130nm, 90nm, 65nm, 45nm 

m@ Supply voltages: 5V down to 1.8V 

NOR flash: 

Density: 512 Mb 

Die efficiency (cell size): 7 Mb/ mm? 

Erase method and final V;: Tunneling, ~1V 
Program method and final V;: CHEI, ~6 V 
Erase time: 200ms/block (512B) 

Program time: 2 ws/word 


Synchronous read access time: 70ns initial access, 6ns sequential/burst (133 MHz) 


Asynchronous read access time: 70ns random 
NAND flash: 

Density: 4Gbits single-die, 32 Gbits stacked-die 
Die efficiency (cell size): 11 Mb/mm? 

Erase method and final V;: Tunneling, ~—2.5V 


Erase time: 1.5ms/block (16 KBytes) 


| 
a 
B 
m Program method and final V7: Tunneling, ~0.7V 
a 
m@ Program time: 200 ws/page (512 Bytes) 

B 


Read access time: 15 ws random, 15ns serial 


17.7 Next-Generation Nonvolatile Memories 


The need for large nonvolatile storage media (e.g., for portable audio/video applications), with fast read 
and write cycles, with low power consumption and also low cost, has spun an intense search for new 
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memory materials, preferably compatible with traditional CMOS processes. Ideally, such next-generation 
memories should exhibit at least the following features: 


m Very high density (terabits /in7) 
Nonvolatility (>20 years data retention) 


Fast read and fast write cycles (few ns for both, comparable to current 6T SRAM) 


a 

a 

m Large endurance (~infinite read-write cycles) 

m= Low power consumption (less than any current memory) 
a 


Low cost (comparable to current DRAM) 


Though still far from ideal, three promising approaches (among several others) are briefly described 
in this section. They are: 


mg FRAM 
g MRAM 
mg PRAM 


17.7.1 Ferroelectric RAM (FRAM) 


FRAM memory was introduced by Ramtron in the 1990s, with substantial progress made since then. An 
FRAM cell is depicted in Figure 17.8. It is similar to a DRAM cell (Figure 16.8) except for the fact that the 
capacitor’s content is now nonvolatile. Such a capacitor is constructed with a ferroelectric crystal as its 
dielectric. 

Contrary to what the name might suggest, a ferroelectric crystal (usually lead-zirconate-titanate, PZT) 
does not contain iron nor is it affected by magnetic fields. The ferroelectric capacitor operation is based 
on the facts that the central atom in the crystal’s face-centered cubic structure is mobile, has two stable 
positions, and can be moved from one stable position to the other (in less than 1ns) by an external elec- 
tric field. That is, when a voltage is applied to the capacitor in one direction, it causes the central atom to 
move in the direction of the applied field until it reaches the other stable position (still inside the cube, of 
course), where it remains indefinitely after the field is removed. If the direction of the field is reversed, 
then the atom returns to the previous stable position. 

To read an FRAM cell, the proper word line is selected, turning ON the nMOS transistor, hence 
applying the bit line voltage to the capacitor. This voltage creates an electric field through the crystal, 
which dislocates the mobile atom to the other stable position or not, depending on its current position. 
If it is dislocated, it passes through a central high-energy position where a charge spike occurs in the 
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FIGURE 17.8. FRAM cell. 
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corresponding bit line, which is detected by the sense amplifier. Therefore, because it is possible to 
detect in which of the two stable positions the atom initially was, the cell can store '0' and '1' values. 
One inconvenience of this process is that it is destructive because the memory-read operation causes 
all atoms to move to the same side of the crystal. Consequently, after reading the cell, a memory-write 
operation must follow to restore the atom to its original position. 

FRAM memories are currently offered with serial and parallel architectures. The former is offered in 
densities varying from 4kb to 512kb, operating with bus speeds up to 20 MHz. The density of the latter 
is currently in the 64 kb to 4Mb range with 55ns access time. Power supply options for both are from 5 V 
down to 2.7 V. The endurance (number of read-write cycles) is virtually unlimited (>10!°), and the power 
consumption is smaller than that of EEPROM or flash. On the other hand, the cells are still much larger 
than DRAM cells, so high density is not possible yet, and the cost per bit is also higher than DRAM. 


17.7.2 Magnetoresistive RAM (MRAM) 


Figure 17.9(a) shows an MRAM cell. Compared to a DRAM cell (Figure 16.8), a variable resistor 
connected to the drain of an nMOS transistor is observed instead of a capacitor connected to its 
source terminal. It also shows the existence of an extra global line, called digit line, needed to program 
the MRAM cell. 

The variable resistor of Figure 17.9(a) represents an MT] (magnetic tunnel junction) device (Figure 17.9(b)), 
which constitutes the core of the MRAM cell. The MJT is formed by two ferromagnetic layers separated by a 
very thin (~2nm) tunneling barrier layer. The latter is constructed with Al,O, or, more recently, with MgO. 
The ferromagnetic layers are programmed by electric currents flowing through the bit and digit lines, which 
are physically very close to the ferromagnetic layers, such that the magnetic fields created by the currents 
can spin-polarize each one of these layers. One of the ferromagnetic layers (bottom plate) is “fixed” (that 
is, it is magnetized by a current flowing always in the same direction through the digit line and is pinned 
by the underlying antiferromagnetic layer), while the other (top plate) is “free” (that is, can be polarized in 
either one of the two possible current directions through the bit line). If the free layer is polarized in the same 
direction as the fixed layer (that is, if their spins are parallel), then electrons can “tunnel” through the barrier 
layer (small resistivity), while spins in opposite directions (antiparallel) cause the barrier to present a large 
resistivity. A small R is considered to be a '0', while a large R is a'1'. 

To read the MRAM cell, the proper word line is selected and a voltage is applied to the bit line. By 
comparing the current through the bit line against a reference value, the corresponding current-sense 
amplifier can determine whether a '0' (low R, so high I) or a'1' (high R, so low I) is stored in the cell. Note 
that, contrary to FRAM, memory-read in MRAM is not destructive. 

The overall features of MRAMs are similar to those described for FRAMs but with higher read- 
write speeds (comparable to SRAM), lower power consumption (mainly because memory-read is not 


Bit line Free ferromag. layer 
(Fe-Co-Ni) 
R \ 
Barrier layer 
Digit line (Al,03 or MgO) R 
Word line Fixed ferromag. layer 


(Fe-Co-Ni) 


Antiferromag. pinning layer 
(FeMn-irVin-PtMn) 


(a) (b) 


FIGURE 17.9. (a) MRAM cell; (b) MTJ device. 
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destructive), and virtually unlimited endurance. One of the first commercial MRAM chips was deliv- 
ered by Freescale in 2006, which consisted of a 4Mb chip with a 35ns access time. A 16 Mb MRAM chip 
was announced by Toshiba in 2007. 


17.7.3 Phase-Change RAM (PRAM) 


PRAM, also called OUM (ovonic unified memory), is a technology licensed by Ovonyx, under development by 
Intel, Samsung, and other companies, again intended as a future replacement for traditional flash memory. 

PRAM is essentially the same technology employed in R/W CDs and DVDs. It consists of a material 
(a chalcogenide, normally Ge,Sb,Te,) whose phase can be changed very easily from crystalline to amor- 
phous and vice versa by the application of heat. In CDs and DVDs the heat is applied by a laser beam, 
while in an electronic memory it is caused by an electric current. When in the crystalline state, the mate- 
rial presents high reflectivity (for the optical CD and DVD) and low resistivity (for our present context), 
while in the amorphous state its reflectivity is low and its resistivity is high. Consequently, a PRAM cell 
can be modeled in the same way as the MRAM cell of Figure 17.9(a), where a variable resistor represents 
the memory element. 

Figure 17.10(a) depicts a phase-change element. It contains two electrodes, which are separated by the 
chalcogenide and silicon oxide layers. At each bit position (crossing of horizontal and vertical memory 
select lines) the SiO, layer has a circular opening filled with a resistive electrode. When voltage pulses 
are applied to the electrodes, current flows from one electrode to the other through the chalcogenide 
and resistive electrode. The latter is heated, causing the former to change its phase in the region adja- 
cent to the resistive element, as indicated in the figure (the phase of the overall material is crystalline). 
If the phase changes from crystalline to amorphous, then the electric resistance between the electrodes 
becomes very high, while a change to crystalline causes it to be reduced. 

The melting temperature of Ge,Sb,Te, is between 600°C and 700°C (it depends on x, y, z). If melted, 
it becomes amorphous, while when kept in the 300°C to 600°C range it rapidly crystallinizes (takes only 
a few ns). Therefore, to store a '1', a high current must be applied, such that the material melts, while a 
'0' is stored by a crystalline phase. To avoid crystallinization while cooling (after a '1' has been stored), 
annealing (that is, rapid cooling) is necessary, which is achieved with the proper choice of materials for 
the electrodes and adjacent components. 

Another important aspect to be taken care of regards the intensity of the pulses needed to write to this 
memory, which should be high only when the cell is in the amorphous state. 

A final concern regards the power consumption. The first PRAM cells employed a phase-change element 
similar to that in Figure 17.10(a), which required relatively high currents (~ 1 mA) for programming. A more 


Sid, SiO. 
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Si02 ——> be Melting region Si02 

Bottom electrode R Bottom electrode 

—_ 
SiO, SiO 
S| ——4 Resistive electrode Si Melting region 
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FIGURE 17.10. (a) Simple phase-change element used in PRAMs; (b) Improved construction to reduce the 
melting volume (lower power consumption). 
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elaborate construction is depicted in Figure 17.10(b), in which melting occurs only inside the ringlike 
opening in the bottom electrode. This reduction of chalcogenide volume to be heated led to much smaller 
programming currents (~0.1mA at 1.5V supply). 


The main features projected for this technology are more or less similar to those projected for MRAM, 


but very likely with smaller cell sizes (already at ~ 0.05 um?/bit). Note that here too the memory-read op- 
eration is not destructive. Prototype chips employing this technology (with 512 Mb) were demonstrated 
in 2006 and are already commercially available. 


17.8 Exercises 


1. 


NOR-type MP-ROM 


Using pseudo-nMOS logic, draw a NOR-type MP-ROM whose contents are those in the LUT 
(lookup table) of Figure E17.1. 


Address Content 
000 0001 
001 0010 
010 0100 
011 1000 
100 0000 
101 1111 
110 1111 
111 0000 
FIGURE E17.1. 
2. NAND-type MP-ROM 


Using pseudo-nMOS logic, draw a NAND-type MP-ROM whose contents are those in the LUT of 
Figure E17.1. 


ROM with conventional gates 

Using conventional gates, design a ROM whose contents are those in the LUT of Figure E17.1. 
OTP-ROM 

a. Briefly describe what an OTP-ROM is and when it is employed. 


b. Instead of fuses or antifuses, what is the other construction technology (now common) for this 
type of device? 


. EPROM 


a. Briefly describe what an EPROM is and when it is employed. 

b. What are its basic differences with respect to OTP-ROM and MP-ROM? 
EEPROM 

a. Briefly describe what an EEPROM is and when it is employed. 

b. What are its basic differences with respect to EPROM? 
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7. 


10. 


11. 


12. 


13. 


Flash memory 

a. Briefly describe what flash memory is and when it is employed. 
b. What are its basic differences with respect to EEPROM? 

c. Why is EEPROM being replaced with flash? 

Flash arrays 

a. Draw a3x4NOR-type flash array. 

b. Draw a3x4NAND-type flash array. 

Flash cells 

a. Briefly compare the four flash cells shown in Figures 17.4(a) and (b) and 17.5(a) and (b). 
b. Why is the split-gate cell popular for embedded applications? 
Multibit flash cells 


Briefly explain how 2-bit flash cells are constructed and work. Examine manufacturers’ data sheets 
for additional details of each approach. 


FRAM 
a. Briefly describe what FRAM memory is and when it is employed. 


b. Check in manufacturers’ data sheets for the current state of this technology. What are the densest 
and fastest chips? 


MRAM 
a. Briefly describe what MRAM memory is and when it is employed. 


b. Check in manufacturers’ data sheets for the current state of this technology. What are the densest 
and fastest chips? 


PRAM 
a. Briefly describe what PRAM memory is and when it is employed. 


b. Check in manufacturers’ data sheets for the current state of this technology. What are the densest 
and fastest chips? 


Programmable Logic 
Devices 


Objective: This chapter describes CPLD and FPGA devices. Owing to their high gate/flip-flop 
density, wide range of I/O standards, large number of I/O pins, easy ISP (in-system programming), 
high speed, and decreasing cost, their presence in modern digital systems has grown substantially. 
Additionally, the ample diffusion of VHDL and Verilog, plus the high quality of current synthesis and 
simulation tools, also contributed to the wide adoption of such technology. CPLD/ FPGA devices allow 
the development of new products with a very short time to market, as well as easy update or modification 
of existing circuits. 


Chapter Contents 


18.1 The Concept of Programmable Logic Devices 
18.2 SPLDs 

18.3 CPLDs 

18.4 FPGAs 

18.5 Exercises 


18.1. The Concept of Programmable Logic Devices 


Programmable logic devices (PLDs) were introduced in the mid 1970s. The idea was to construct 
combinational logic circuits that were programmable. However, contrary to microprocessors, which can 
run a program but possess a fixed hardware, the programmability of PLDs was intended at the hardware 
level. In other words, a PLD is a general purpose chip whose hardware can be configured to meet particular 
specifications. 

The first PLDs were called PAL (programmable array logic) or PLA (programmable logic array), 
depending on the programming scheme (described later). They employed only conventional logic gates 
(no flip-flops), therefore targeting only the implementation of combinational circuits. To extend their 
coverage, registered PLDs were launched soon after, which included one flip-flop at each circuit output. 
With them, simple sequential functions could then be implemented as well. 

In the beginning of the 1980s, additional logic circuitry was added to each PLD output. The new out- 
put cell, normally referred to as macrocell, contained, besides the flip-flop, logic gates and multiplexers. 
The cell was also programmable, allowing several modes of operation. Additionally, it provided a return 
(feedback) signal from the circuit output back to the programmable array, which gave the PLD greater 
flexibility. This new PLD structure was called generic PAL (GAL). A similar architecture was known as 
PALCE (PAL CMOS electrically erasable/ programmable device). All these chips (PAL, PLA, registered 
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PAL (mid 1970s) 

SPLDs PLA (mid 1970s) 
Registered PAL/PLA (late 1970s) 
GAL/PALCE (early 1980s) 


PLDs 


CPLDs (mid 1980s) 
FPGAs (mid 1980s) 


FIGURE 18.1. Summary of PLD evolution. 


PLD, and GAL/PALCE) are now collectively referred to as SPLDs (simple PLDs). Of these, GAL is the 
only one still manufactured. 

In the mid 1980s, several GAL devices were fabricated on the same chip using a sophisticated routing 
scheme, more advanced silicon technology, and several additional features, like JTAG support (port for 
circuit access/test defined by the Joint Test Action Group and specified in the IEEE 1149.1 standard) and 
interface to several logic standards. Such an approach became known as CPLD (complex PLD). CPLDs 
are currently very popular due to their relatively high density, high performance, and low cost (some 
cost nearly as low as $1), making them a popular choice in many applications, including consumer elec- 
tronics, computers, automotive, etc. 

Finally, also in the mid 1980s, FPGAs (field programmable gate arrays) were introduced. FPGAs differ 
from CPLDs in architecture, technology, built-in features, and cost. They target mainly complex, large-size, 
top-performance designs, like gigabit transceivers, high-complexity switching, HDTV, wireless, and other 
telecommunication applications. 

A final remark is that CPLDs are essentially nonvolatile, while FPGAs are volatile. CPLDs normally 
employ EEPROM (Section 17.5) or flash memory (Section 17.6) to store the interconnects, while FPGAs 
employ SRAM (Section 16.2). Consequently, the latter needs a configuration nonvolatile memory from which 
the program is loaded at power up. A table illustrating the evolution of PLDs is presented in Figure 18.1. 


18.2 SPLDs 


As mentioned above, PAL, PLA, and GAL devices are collectively called SPLDs, which stands for simple 
PLDs. A description of each one of these architectures follows. 


18.2.1 PAL Devices 


PAL (programmable array logic) chips were introduced by Monolithic Memories in the mid 1970s. Its 
basic architecture is illustrated in Figure 18.2, where the little ovals represent programmable connec- 
tions. As can be seen, the circuit is composed of a programmable array of AND gates followed by a fixed 
array of OR gates. This implementation is based on the fact that any combinational function can be 
represented by a sum-of-products (SOP), as seen in Section 5.3. The products are computed by the AND 
gates, while the sum is computed by the OR gate that follows. 

A PAL-based example is depicted in Figure 18.3, which computes the combinational functions 
fi=a-bt+a'-b'-c'-d'+b-d and f,=a-b-c+d. The dark ovals indicate a connection. The outputs of non- 
programmed AND gates are set to zero. 

As mentioned earlier, the main limitation of this approach is that it is appropriate only for the imple- 
mentation of combinational functions. To circumvent this problem, registered PALs were launched toward 
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Outputs 


= 
yy Programmable 
SS _———_————’ interconnect 


Inputs 


FIGURE 18.2. Basic PAL architecture. 


f;= a.b + a’.b’.c’.d’ +b.d 


f2= a.b.c+d 


a ca’ bb’ cc ‘dd’ 


FIGURE 18.3. PAL-based example where two combinational functions (f,, f;) of four variables (a, b, c, d) are 
implemented. Dark ovals indicate a connection. The output of a nonprogrammed AND gate is set to zero. 
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the end of the 1970s. These included a flip-flop at each output (after each OR gate in Figure 18.2), thus 
allowing the construction of sequential circuits as well (though only very simple ones). 

An example of a then popular PAL chip is the PAL16L8 device, which contained 16 inputs and eight 
outputs (though only 18 I/O pins were available because it was a 20-pin DIP package with two pins 
destined to the power supply, plus 10 IN pins, two OUT pins, and six IN/OUT pins). Its registered coun- 
terpart was the 16R8 device. 

The early technology employed in the fabrication of PALs was bipolar (Chapter 8) with fuses or anti- 
fuses normally employed to establish the (nonvolatile) array connections. They operated with a 5 V sup- 
ply voltage and exhibited a large power consumption for such small devices (around 200mA with open 
outputs) with a maximum frequency around 100 MHz. 


18.2.2 PLA Devices 


PLA (programmable logic array) chips were introduced in the mid 1970s by Signetics. The basic archi- 
tecture of a PLA is illustrated in Figure 18.4. Comparing it to that in Figure 18.2, we observe that the 
only fundamental difference between them is that while a PAL has programmable AND connections 
and fixed OR connections, both are programmable in a PLA. The obvious advantage is greater flexibility 
because more combinational functions (more product terms) can be implemented with the same amount 
of hardware. On the other hand, the extra propagation delay introduced by the additional program- 
mable interconnections lowered their speed. 

An example of a then popular PLA chip is the Signetics PLS161 device. It contained 12 inputs and 
eight outputs, with a total of 48 12-input AND gates, followed by a total of eight 48-input OR gates. 
At the outputs, additional programmable XOR gates were also available. 


Inputs ¥ 


Outputs 


FIGURE 18.4. Basic PLA architecture. 
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The technology then employed in the fabrication of PLAs was the same as that of PALs. Though PLAs 
are also obsolete now, they reappeared a few years ago as a building block in the first family of low 
power CPLDs, the CoolRunner family from Xilinx. 


18.2.3 GAL Devices 


The GAL (generic PAL) architecture was introduced by Lattice in the beginning of the 1980s. It con- 
tains several important improvements over the first PALs. First, a more sophisticated output cell (called 
OLMC—output logic macrocell, or simply macrocell) was constructed, which included a flip-flop, an 
XOR gate, and five multiplexers, with internal programmability that allowed several modes of opera- 
tion. Second, a return (feedback) signal from the macrocell back to the programmable array was also 
included, conferring to the circuit more versatility. Third, EEPROM was employed instead of fuses /anti- 
fuses or PROM/EPROM to store the interconnects. Finally, an electronic signature for identification and 
protection was also made available. 

This type of device is illustrated in Figure 18.5, which shows the GAL16V8 chip. The largest part of the 
diagram comprises the programmable AND array, where the little circles represent programmable inter- 
connections. As can be seen, the circuit is composed of eight sections, each with eight AND gates. The 
eight AND outputs in each section are connected to a fixed OR gate located inside the macrocell (shown 
later), which completes the general PAL architecture of Figure 18.2. The circuit has 16 inputs and eight 
outputs, hence the name 16V8. However, because its package has only 20 pins, the actual configuration 
is eight IN pins (pins 2-9) and eight IN/OUT pins (pins 12-19), plus CLK (pin 1), /OE (output enable, 
pin 11), GND (pin 10), and Vpp (pin 20). Observe that at each output there is a macrocell. 

The macrocell’s internal diagram is shown in Figure 18.6. As mentioned above, it contains the fixed 
eight-input OR gate to which the programmable ANDs are connected. It contains also a programmable 
XOR gate followed by a DFF. A multiplexer allows the output signal to be chosen between that coming 
directly from the OR/XOR gate (for combinational functions) and that coming from the DFF (for sequen- 
tial functions), while another multiplexer allows the return (feedback) signal to be picked from the DFF, 
from an adjacent macrocell, or from its IN/OUT pin. Notice the presence of three more multiplexers, 
one for selecting whether the output of the top AND gate should or should not be connected to the OR 
gate, another for choosing which signal should control the output tri-state buffer, and finally another to 
choose whether a zero or the macrocell’s output should be sent to the other adjacent macrocell. 

As mentioned earlier, GAL devices are still manufactured (by Lattice, Atmel, etc.). CMOS technology 
is now employed, which includes EEPROM or flash memory for interconnect storage, 3.3 V supply volt- 
age, and a maximum frequency around 250MHz. 


18.3 CPLDs 


As mentioned before, SPLDs (simple PLDs) were replaced with CPLDs (complex PLDs), originally 
obtained by constructing and associating several SPLDs on the same chip. 


18.3.1 Architecture 


The basic approach to the construction of a CPLD is illustrated in Figure 18.7. It consists of several PLDs 
(GALs, in general) fabricated on the same chip, which communicate through a complex and program- 
mable interconnecting array. I/O drivers and a clock/control unit are also needed. Several additional 
features are inherent to modern CPLDs, notably JTAG support, a variety of I/O standards (LVTTL, 
LVCMOS, etc.), a large number of user I/O pins, and low-power operation. 
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FIGURE 18.5. GAL 16V8 chip. 


The Xilinx XC9500 CPLD series is an example of a CPLD constructed according to the general archi- 
tecture depicted in Figure 18.7 (and the same is true for the CoolRunner family, though this employs 
PLAs instead of GALs). It contains n PLDs, each resembling a V18 GAL (therefore similar to the 16V8 
architecture of Figure 18.5, but with 18 programmable AND arrays instead of eight, hence with 18 macro- 
cells each), where n=2, 4, 6, 8, 12, or 16. With these values of n, CPLDs with 18n = 36 up to 288 macrocells 
are obtained. This fact can be verified in the XC9500 data sheets available at www.xilinx.com. 
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OE From adjacent macrocell 


CLK To adjacent macrocell 


FIGURE 18.6. Macrocell diagram. 


Clock and 
control 


FIGURE 18.7. Basic CPLD architecture, which consists of n PLDs (GALs, in general) interconnected through a 
programmable switch array, plus I/O bank and clock/control unit. 
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The Altera MAX3000 CPLD series is another example of a CPLD constructed according to the general 
architecture depicted in Figure 18.7 (and the same is true for the MAX7000 series). AMAX3000 CPLD contains 
n PLDs, each resembling a V16 GAL (therefore similar to the 16V8 architecture of Figure 18.5 but with 16 
programmable AND arrays instead of eight, hence with 16 macrocells each), where n=2, 4, 8, 16, or 32. Altera 
calls these individual PLDs by the name LAB (logic array block), and the interconnect array by PIA (pro- 
grammable interconnect array). With the values of listed above, CPLDs with 161 =32 up to 512 macrocells 
are obtained. This architecture can be verified in the MAX3000A data sheets available at www.altera.com. 

Besides lower power consumption, recent CPLDs exhibit higher versatility and more features than 
the traditional GAL-based architecture described above. As an example, Figure 18.8 depicts the overall 
approach used in the Altera MAXII CPLD series. As shown in Figure 18.8(a), a LAB (logic array block) is 


Column interconnect 
Local interconnect 
Column interconnect 
Local interconnect 
Column interconnect 


Row interconnect 


FIGURE 18.8. Basic architecture of the Altera MAXII CPLDs. (a) A LAB (logic array block) is no longer a GAL, 
but a collection of LEs (logic elements), with finer interconnecting buses; (b) Each LAB is composed of ten LEs 
with local as well as global interconnections (this is a “simplified” FPGA). 
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FIGURE 18.9. Simplified diagram of the LE employed in the Altera MAXII CPLDs. 


no longer a GAL, but a collection of LEs (logic elements) to which a vaster interconnecting array is made 
available. Additionally, Figure 18.8(b) shows that each LAB is composed of ten LEs with local as well 
as global (row plus column) interconnects. Consequently, this type of CPLD is more like a “simplified” 
FPGA than a traditional CPLD. 

A simplified diagram for the LE mentioned above is presented in Figure 18.9. This circuit differs 
substantially from the traditional PLD-based approach (which employs a PAL array plus a macrocell) 
described earlier. First, instead of a PAL array, it employs a lookup table (LUT) to implement combina- 
tional logic. Because it is a four-input LUT, it can implement any combinational function of four vari- 
ables, therefore spanning a wider binary space. For the case when more than four variables are needed 
(or for table sharing), a LUT chain output is provided. Likewise, a register chain output is provided for 
large sequential functions. Carry chain is also available, thus optimizing addition/subtraction (observe 
the XOR gate on the left and check the circuits in Figure 12.13(a)). 

To conclude this section, we present two tables that summarize the main features of current CPLDs. 
The first (Figure 18.10) presents the CPLDs offered by Xilinx, while the second (Figure 18.11) shows those 
from Altera. It is important to mention, however, that other companies, like Atmel and Lattice, also offer 
this kind of device. 


18.3.2 Xilinx CPLDs 


Looking at Figure 18.10, we observe three CPLD series (XC9500, CoolRunner XPLA3, and CoolRun- 
ner I) whose overall performance and other features grow from left to right. The first table line 
shows the power supply options, which go from 5V down to 1.8V (lower power consumption), 
while the second line shows the building blocks described earlier, that is, GAL for XC9500 and PLA 
for CoolRunner. 

The number of macrocells (number of GAL or PLA sections) appears in the third line, that is, 36-288 
for XC9500 (compare these numbers to those calculated earlier) and 32-512 for CoolRunner. In the next 
line, the number of flip-flops is shown, which is one per macrocell (in the CoolRunner II they are dual- 
edge flip-flops). 

The number of I/O pins appears next and can be relatively large. The types of I/O standards supported by 
CPLDs is another important parameter because in modern/complex designs communication with special- 
ized units (memories, other ICs, specific buses, etc.) is often required (see I/O standards in Section 10.9). 
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The following line regards the use of Schmitt triggers (Section 11.13) in the input pads (to increase 
noise immunity), which is a common feature in modern devices. As can be seen, they are available in 
the XC9500 and CoolRunner II series (with a much higher hysteresis in the latter). Another important 
parameter is the maximum system frequency, which can be higher than 300 MHz. Observe that the larger 
the CPLD, the lower the maximum frequency. 

The next line shows the power consumption, a crucial parameter in portable products (digital cameras, 
MP3 players, etc.). The values shown are for the smallest CPLD in each series, which is operating with 
nearly all cells in reduced-power (so lower speed) mode, and for the largest CPLD, which is operating 
with all cells in normal power (so regular speed) mode, with each pair of values given for the device 
in standby mode (0Hz) and operating at 100 MHz. As can be seen, the CoolRunner devices present 
very low (near zero) power consumption in standby, and also a relatively low consumption in high 
frequency (CoolRunner II only). Indeed, CoolRunner was the first low-power CPLD. The reduced power 
in standby is due to the sense amplifiers used in the SOPs, replaced with fully digital gates that drain no 
static current (Xilinx calls this approach “RealDigital”). Besides the avoidance of sense amplifiers and 
the use of a low supply voltage (1.8 V), the lower power consumption of CoolRunner II at high frequency 
is due also to a reduced global clock frequency, obtained with a global clock divider, with the clock 
frequency reestablished locally (at the macrocells) by a clock doubler. 


erste 
| = xcgs00——=é*”di CoolRunner XPLA3 CoolRunner II 


Power supply _ SV. — —s = 1.8V 
Building block PLA 


Number of macrocells = a ot 32-512 


Number of flip-flops 1 per macrocell 1 per macrocell 1 per macrocell 
single edge single edge dual edge 
Number of user I/O pins 36-192 36-260 | 33-270 


Supported I/O standards (*) LVTTL (3.3V) TTL (5V) LVTTL (3.3V) 


LVCMOS (2.5V, 1.8V) LVTTL (3.3V) LVCMOS (3.3V, 2.5V, 
1.8V, 1.5V) 


SSTL_3, SSTL_2 
HSTL-18 


input hysteresis 
frequency 


Current consumption (lec): 
Standby (f=0Hz) 12 mA®-500 mA® ~20uA ~20uA 
i f=100 MHz 27mA”’-700 mA” 10mA“-240mA” 3.5mA%-90 mA” 


Technology CMOS 0.35um Flash |CMOS 0.35um EEPROM| CMOS 0.18um Flash 
Sense amplifiers Analog RealDigital RealDigital 
Clock divider and clock N Y 

doubler 

(*) See section 10.9 for details (4) 32 macrocells 
(1) 5 V, 288 macrocells (5) 2.5 V, 36 macrocells, nearly all in low-power mode 


(2) 2.5 V, 36 macrocells (6) 5 V, 288 macrocells, all in regular power mode 
3) 512 macrocells 


FIGURE 18.10. Summary of current Xilinx CPLDs (overall performance and other features grow from left 
to right). 


18.3 CPLDs 477 


The table of Figure 18.10 also shows the technology used to fabricate these devices, which is 0.35 wm 
or 0.18 wm CMOS (Chapters 9-10), with either EEPROM (Section 17.5) or flash (Section 17.6) nonvolatile 
memory employed to store the interconnects. 


18.3.3 Altera CPLDs 


A similar set of features can be observed in the table of Figure 18.11, which shows the three CPLD series 
(MAX7000, MAX3000, and MAXII) currently offered by Altera (again, the overall performance and other 
features grow from left to right). 

The first line shows the power supply options, which again go from 5 V down to 1.8 V (lower power 
consumption). The second line shows the building blocks described earlier, that is, GAL for MAX7000 
and MAX3000, and LAB (logic array block) composed of LEs (logic elements) for MAXII. 

The next two lines show the number of macrocells (number of GAL sections) or of LEs, followed 
by the respective number of flip-flops. As can be seen, these numbers are also comparable to those for 
the Xilinx CPLDs, except for the number of flip-flops in the MAXII devices, which can be substantially 
larger. 


Feature Device series (performance — 


Power suppl a a ae 2.5V a aT ce 2.5V, 1.8V 


_LAB(IOLEs) 
Number of macrocells 192-1700 “” 
Number of logic elements 240-2210 
Number of flip-flops 1 per macrocell 1 per macrocell 1 per logic element 

all single-edge) 
Number of user I/O pi 36-212 34-208 80-272 
Supported I/O TTL (5V) TTL (5V) LVTTL (3.3V, 2.5V, 1.8V) 


standards (*) LVTTL (3.3V) LVTTL (3.3V) VCMOS (3.3V, 2.5V, 1.8V, 1.5V) 
LVCMOS (3.3 LVCMOS (3.3V, 2.5 PCI 


input hysteresis 160 mV, 300 mV 
Maximum system 91 MHz”~175 MHz™ | 116MHz“)—227 MHz” 304MHz2” 
frequency 


Current consumption (Icc): 
Standby (f=0 Hz) 15mA®—300mA” 10mA®-350mA” 2mA-12mA 
i f=100MHz 600mA” 15mA—420mA® 30mA"?-550 mA"” 


Technolog CMOS EEPROM 
User Flash 
(*) See section 10.9 for details. (6) 2.5V, 32 macrocells, nearly all in ee mode 
(1) Macrocell equivalence (7) 5V, 256 macrocells, all in regular power mode 
(2) 256 macrocells (8) 32 macrocells, all in low-power mode 
(3) 32 macrocells (9) 512 macrocells, all in high-power mode 
(4) 512 macrocells (10) 1.8 V, 240 logic elements 
i 11) 3.3V, 2210 logic elements 


FIGURE 18.11. Summary of current Altera CPLDs (overall performance and other features grow from left 
to right). 
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Information regarding I/O follows, showing the number of pins, the types of I/O standards, and the 
use (or not) of Schmitt triggers. Next, the maximum system frequencies are listed, which can again be as 
high as ~300 MHz. 

The power consumption is shown next, using the same methodology adopted in Figure 18.10. Like 
CoolRunner, MAXII is a low-power CPLD. Even though the latter has a higher power consumption in 
standby, in regular operation they are comparable. 

The technologies used to fabricate these devices are listed in the following line, with again either 
EEPROM (Section 17.5) or flash (Section 17.6) nonvolatile memory employed to store the interconnects. 
Finally, a feature that is proper of FPGAs (user memory) is shown for the MAXII series (recall that MAXII 
is indeed a simplified FPGA). 


18.4 FPGAs 


Field programmable gate array (FPGA) devices were introduced by Xilinx in the mid 1980s. As mentioned 
earlier, they differ from CPLDs in terms of architecture, technology, built-in features, size, performance, 
and cost. To describe the construction and main features of FPGAs, two top-performance devices (Virtex 
5 from Xilinx and Stratix II from Altera) will be used as examples. 


18.4.1 FPGA Technology 


The technology employed in the fabrication of the two devices mentioned above is 65nm CMOS 
(Chapters 9-10), with all-copper metal layers. A low-K dielectric is used between the copper layers to 
reduce interconnection capacitances. The maximum internal clock frequency achieved in these two 
devices is 550 MHz for Virtex 5 and 600 MHz for Stratix III. 

The typical supply voltage for 65nm technology is Vpp=1V, which reduces the dynamic 
power consumption (Pay, =C Vpp’f) in approximately 30% with respect to the previous technology 
node, 90nm, for which the power supply was 1.2 V. Note, however, that for the same number of 
gates and equivalent routing the equivalent capacitance (C) is lower, reducing the dynamic power 
consumption even further. Architectural improvements were also introduced in these two devices, 
like lower interconnect capacitances and the use of low-power transistors in the noncritical sections, 
allowing a combined relative (that is, for the same number of gates and same frequency) dynamic 
power reduction of over 40%. Recall, however, that the new devices are denser and faster, hence off- 
setting such savings. 

Regarding the static power consumption, however, even though it normally decreases with the sup- 
ply voltage, that is not exactly so with such small transistors (65nm gate length) because current leakage 
is no longer negligible (due mainly to subthreshold leakage between drain and source and to gate oxide 
tunneling). One of the improvements developed to cope with this problem is the triple-oxide process. 
Previous (90nm) FPGAs used two oxide thicknesses, that is, one very thin (thinox), employed basically 
in all core transistors, and the other thick (thickox), employed in the higher voltage tolerant transistors 
of the I/O blocks. In the FPGAs mentioned above, a medium thickness oxide (midox) was included. This 
transistor has a higher threshold voltage and lower speed than thinox transistors, but it also has a much 
lower leakage and is employed in the millions of configuration memory cells, which are not critical, as 
well as in other FPGA sections where top performance is not required. As a rough preliminary estimate 
for the total power budget of high performance FPGAs, ~1 W can be considered for small devices, ~5 W 
for medium devices, and ~10W for large ones. 
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Another improvement, which was adopted in the Stratix III FPGA, consists of using strained silicon 
(described in Section 9.8) to increase the transistors’ speed. All Stratix III transistors are strained, allowing 
many of them to be constructed with midox instead of thinox, thus preventing leakage without compro- 
mising speed. 


18.4.2 FPGA Architecture 


The overall architecture of FPGAs is depicted in Figure 18.12, which presents a simplified view of the 
Virtex 5 and Stratix IIT FPGAs. The former is illustrated in Figure 18.12(a), where the programmable logic 
blocks are called CLB (configurable logic block), with each CLB composed of two Slices (one of type L, 


Programmable 
a’ Interconnect array 


CLB 


FIGURE 18.12. Simplified FPGA architectures: (a) Xilinx Virtex 5; (b) Altera Stratix III. Additional blocks (SRAM, 
DSP, etc.) and additional routing not shown. 
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the other of type L or M). Stratix III is depicted in Figure 18.12(b), where the programmable logic blocks 
are called LAB (logic array block), with each LAB composed of 10 ALMs (adaptive logic modules). 
Because the programmable logic blocks are relatively complex, this type of architecture is considered to 
be medium grained (fine-grained structures have been considered, but a move in the opposite direction 
proved to be more efficient; indeed, every new FPGA generation has seen programmable logic blocks 
with increased size and complexity). 


18.4.3 Virtex CLB and Slice 


As shown in Figure 18.12(a), the basic building block in the Xilinx Virtex 5 FPGA is the CLB, which is 
composed of two Slices, called Slice L or Slice M. The difference between them is that the LUT (lookup 
table) in the latter, when not employed to implement a function, can be used as a 64-bit distributed SRAM, 
which is made available to the user as general purpose memory, or it can be used as a general purpose 
32-bit shift register. Approximately one-fourth of the total number of Slices in the Virtex 5 FPGA are of 
type M. 

Asimplified view of a Slice (type L) is presented in Figure 18.13. It is composed of four similar sections 
(logic cells), containing a total of four six-input LUTs, four DFFs, and a fast carry chain. The total number 
of CLBs, Slices, and registers in the Virtex 5 FPGA can be seen in Figure 18.18. The maximum internal 
clock frequency is 550 MHz. 


Carry out 


Logic Cell C rc 
Cx 
r—- ca 
— —— 
B1-6 ml -— Bmux 
Bee! Logic Cell B — B 
— BQ 
== 
A1-6 ml [— Amux 
wt Logic Cell A TA 
r- AQ 
SR CE CLK Carry in 


FIGURE 18.13. Simplified diagram of the Slice L unit employed in the Virtex 5 FPGA, which contains four 
six-input LUTs, four DFFs, and a fast carry chain. 
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18.4.4 Stratix LAB and ALM 


As shown in Figure 18.12(b), the basic building block in the Altera Stratix III FPGA is called LAB, which 
is composed of 10 ALMs. A simplified ALM diagram is presented in Figure 18.14. Notice that this circuit 
is not much different from that seen in Figure 18.9. It contains two six-input LUTs, two adders for fast 
arithmetic/carry chains, plus two DFFs and register chain for the implementation of large sequential 
functions. Even though the ALM can operate with two six-input LUTs, note that it has only eight inputs, 
so four of them are common to both tables (see detail on the left of Figure 18.14). Still, having eight inputs 
confers great flexibility to the LUTs, which can then be configured in several ways, including two com- 
pletely independent 4-LUTs (four-input LUTs), or a 3-LUT plus a 5-LUT, also independent, etc. Similarly 
to Virtex 5, Stratix III can also have the unused ALMs converted into user (distributed) SRAM. ALMs that 
allow such usage are called MLAB ALMs, each providing 640 bits of user RAM; 5% of the total number 
of ALMs are of this type, thus resulting in a total of approximately 32 bits/ ALM of extra user memory. 
The MLAB ALMs allow also the construction of shift registers, FIFO memory, or filter delay lines. The 
total number of LABs, ALMs;, and registers in the Stratix III FPGA can be seen in Figure 18.18. The maxi- 
mum internal clock frequency is 600 MHz. 


18.4.5 RAM Blocks 


Besides the indispensable programmable logic blocks, modern FPGAs also include other blocks, which 
are helpful in the development of large and/or complex designs. These blocks normally are the following: 
SRAM blocks, DSP blocks, and PLL blocks. 

Because most designs require memory, the inclusion of user SRAM is one of the most common and 
helpful features. Both FPGAs mentioned above contain user SRAM blocks. To illustrate this point, 
Figure 18.15 shows the floor plan of a Stratix III FPGA (the smallest device in the E series), where, besides 
the ALMs and I/Os (which are indispensable), RAM, DSP, PLL, and DLL blocks can also be observed. 

The SRAM blocks in this FPGA are divided into three categories, called M9k (9216 bits), M144k 
(147,456 bits), and MLAB (640 bits). The first two types of blocks can be observed in the diagram of 


Shared Register 
arith. in Carry in chain in 


outputs 


Input 1-8 


Shared = Carry out Register 
arith. out chain out 


FIGURE 18.14. Simplified diagram of the ALM unit employed in the Stratix Ill FPGA, which contains two 
six-input LUTs (though inputs are not completely independent), two DFFs, two adders, plus carry and register 
chains. 
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FIGURE 18.15. Stratix Ill floor plan (smallest device in the E series). 


Figure 18.15; they operate as simple dual-port memory at up to 600MHz. The third type (MLAB) can 
also operate at 600 MHz but is obtained from ALMs of type MLAB, as explained earlier (~32bits / ALM 
result when the total number of ALMs is considered). As mentioned above, the MLAB ALMs can also 
operate as shift registers, FIFO, or filter delay lines. The total number of user SRAM bits can be seen in 
the summary of Figure 18.18 and goes from 2.4Mb (smallest device in the L series) to 20.4Mb (largest 
device in the L series). 

The Virtex 5 FPGA also contains user SRAM blocks, each of size 36 kbits, and configurable in several 
ways (36kx 1, 16kx2, 8kx 4, etc.), totaling 1.15 Mb in the smallest device and 10.4 Mb in the largest one. 
Moreover, recall that the memory used to construct the LUTs in the M-type Slices can be employed as 
distributed user SRAM when the LUTs are not needed. This memory can also be configured as single-, 
dual-, or quad-port SRAM or as a shift register. Like Stratix III, the distributed RAM adds roughly 30% 
more bits to the existing user RAM blocks. The resulting total number of bits can be seen in the summary 
of Figure 18.18 and goes from 1.47 Mb to 13.8 Mb. All SRAM cells can operate at up to 550 MHz. 


18.4.6 DSP Blocks 


DSP (digital signal processing) is often required in applications involving audio or video, among others. 
Such processing (FIR/IIR filtering, FFT, DCT, etc.) is accomplished by three basic elements: multipliers, 
accumulators (adders), and registers. To make this type of application simpler to implement (route) and 
also faster (less routing delays), special DSP blocks are included in modern FPGAs (see Figures 18.15 
and 18.18), normally containing large parallel multipliers, MAC (multiply-and-accumulate) circuits, and 
shift registers. 

Each DSP block available in the Stratix II FPGA contains eight 18 x 18 multipliers (which can also be 
configured as 9x9, 12x12, or 36x36 multipliers), plus associated MAC circuits and shift registers. The 
total number of DSP blocks varies from 27 (in the smallest L-series device) to 96 (largest L-series device) 
and is capable of operating at up to 550 MHz. 
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The Virtex 5 FPGA also contains DSP blocks (called DSP48E Slices). Each block includes a 25x18 
multiplier, plus MACs, registers, and several operating modes. The total number of such blocks varies 
from 32 (in the smallest device) to 192 (in the largest device), with a maximum frequency of 550 MHz. 
Information regarding the DSP blocks can also be seen in the summary of Figure 18.18. 


18.4.7 Clock Management 


Clock management is one of the most crucial aspects in high performance devices. It includes two main 
parts: clock distribution and clock manipulation. 

An adequate clock distribution network is necessary to minimize clock skew (that is, to avoid the 
clock reaching one section of the chip much later than it reaches others), which can be disastrous in 
synchronous systems operating within tight time windows. This type of network is constructed with 
minimum parasitic resistances and capacitances, and its layout tries to balance the distances between the 
diverse regions of the chip as best as possible. For example, the Stratix III FPGA exhibits three kinds of 
clock distribution networks, called GCLK (global clock), RCLK (regional clock), and PCLK (peripheral 
clock). The first two can be observed in Figure 18.16. The clock signals feeding these networks can come 
only from external sources (through dedicated clock input pins) or from PLLs (up to 12, located near 
the input clock pins, at the top, bottom, left, and right of the chip frame). Stratix II has up to 16 GCLKs, 
88 RCLKs, and 116 PCLKs (total of 220) clock networks. 

Clock manipulation is the other fundamental part of clock management. To do so, PLLs (Section 14.6) 
are normally employed, which serve the following four main purposes: clock multiplication, clock divi- 
sion, phase shift, and jitter filtration. 

The Stratix III devices can have up to 12 PLLs (distributed in the positions already indicated 
in Figure 18.16), whose simplified diagram is shown in Figure 18.17. Comparing it to that in 
Figure 14.25, we observe that the PLLs proper are similar (note that the + M prescaler in the latter is 
represented by +m in the former). However, the PLL of Figure 18.17 exhibits additional features, like 


Top clock pins or PLLs Top clock pins or PLLs 


Left clock pins or PLLs 
Right clock pins or PLLs 
Left clock pins or PLLs 
Right clock pins or PLLs 


Bottom clock pins or PLLs Bottom clock pins or PLLs 


(a) (b) 


FIGURE 18.16. (a) Global and (b) regional clock distribution networks of Stratix III. 
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FIGURE 18.17. Simplified diagram of the Stratix III PLLs. 


the programmable PLL seen in Figure 14.29. The Stratix III PLL includes eight shifted versions of the 
generated signal, an optional +2 block within the PLL loop, a pre-PLL divider (+n, similar to the +R of 
Figure 14.29), plus several post-PLL dividers (represented by +c, +c, ..., +Cy, where x can be 6 or 9). 
These features allow a wide range of frequency division and multiplication factors as well as a wide 
selection of clock phases. 

Another interesting feature of this PLL is that it allows the generation of clocks with nonsymmet- 
ric duty cycles, though only when the post-PLL dividers are employed. For example, if cy=10, then 
the duty cycle of the corresponding output can be 10-90% (that is, one high, 9 low), 20-80% (2 high, 
8 low), ..., 90-10% (9 high, one low). It also allows a 50-50% duty cycle when the divisor is odd, which 
consists of switching the signal from low to high at the rising edge of the VCO clock and from high to 
low at the falling edge. 

The Stratix III PLLs also allow six modes of operation, depending on where the feedback clock comes 
from (these modes are defined by the multiplexer installed in the feedback loop—see inputs marked 
with (1) and (2) in Figure 18.17). If input (1) is employed, then the no-compensation mode results, while 
for the other five modes input (2) is employed. All six modes are described below. 


No-compensation mode: In this case, input (1) is used, so the generated clock is locally phase aligned 
with the PLL reference clock (from a clock input pin). Due to less circuitry, this is the option with 
less jitter. 


Normal mode: In this case, the generated clock, taken from an internal point at flip-flop clock inputs, 
is phase aligned with the PLL reference clock. 


Zero-delay buffer mode: The generated clock, taken from a clock output pin, is phase aligned with 
the PLL reference clock (which enters the device), hence resulting zero delay through the device (the 
delay is exactly one clock period). 

External feedback mode: A clock leaving the device is aligned with the reference clock, like above, 
but taken from a point after going through external circuitry (that is, after traveling through 
the PCB). 


Source-synchronous mode: The generated clock, taken from a point feeding I/O enable registers, is 
phase aligned with the PLL reference clock. 
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Source-synchronous for LVDS compensation mode: In this case, the clock input to SERDES (serializer / 
deserializer) registers is phase aligned with the PLL reference clock. 


Stratix III also contains four DLL (delay locked loop) units, located one in each corner of the device 
(Figure 18.15). Though DLLs can also be used for clock manipulation, they are used only for phase shift- 
ing when interfacing with external memory. More specifically, they are used for phase shifting the DQS 
(data strobe) signals for memory read and write operations. Each DLL has two outputs, allowing eight 
phase-shift settings. The combined options include the following shifts: 0°, 22.5°, 30°, 36°, 45°, 60°, 67.5°, 
72°, 90°, 108°, 120°, 135°, 144°, or 180°. These, however, depend on the chosen DLL frequency mode, 
for which there are five options. For example, for 100 MHz-167 MHz, the resolution is 22.5°, while for 
400 MHz it is 45°. 

Virtex 5 also contains a powerful clock management system. It is composed of up to six CMTs (clock 
management tiles), each constructed with two DCM (digital clock management) units and one PLL. 
Besides extensive clock control circuitry, the DCM also contains one DLL, employed for local clock 
deskew (phase shifting). Like Stratix III, the PLLs are used for clock multiplication, division, jitter filtra- 
tion, and network clock deskew. Each device contains 20 dedicated clock pins, which are connected to 32 
global clock networks (GCLKs). There are also four regional clock networks (RCLKs) in each one of the 
device’s 8 to 24 regions, thus totaling 32 to 96 RCLKs. A third type of clock network also exists, which 
feeds the I/O SERDES circuits. 


18.4.8 I/O Standards 


As mentioned earlier, FPGAs support a wide selection of I/O standards, most of which are described in 
Section 10.9. 


18.4.9 Additional Features 


The Virtex 5 FPGA is offered in two main versions, called LX and LXT. What distinguishes these two 
series is the set of additional features aimed at particular types of designs. The LXT series contains a PCI 
Express interface, Ethernet blocks (four), and high-speed serial transceivers (8-16). Likewise, the Stratix 
III FPGA is offered in three main versions, called L, E, and GX. Compared to the L series, the E series has 
additional RAM and DSP blocks, while the GX series includes additional RAM blocks and high-speed 
serial transceivers. 


18.4.10 Summary and Comparison 


The table in Figure 18.18 summarizes the main features of the two FPGAs described above. To ease the 
analysis, related topics are grouped together and are presented in the same order as the descriptions 
above. It is important to mention, however, that other companies also offer FPGAs, like Atmel, Lattice, 
and QuickLogic. 

The price ranges for CPLDs and FPGAs, very roughly speaking, are the following: 


mw CPLDs: $1 to $100 
m Basic FPGAs (for example, Spartan 3, Cyclone II): $10 to $1000 
m@ Top FPGAs (for example, Virtex 5, Stratix III): $100 to $10,000 
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Xilinx Virtex 5 (LX series Altera Stratix Ill (L series 
CMOS 65nm (SRAM CMOS 65nm (SRAM 
| Corevoltage 0.9V or 1.1V 


Number of LABS (Stratix) seeceeeeae- 1900-13,520 
Number of ALMs Stratix = per LAB = 19,000-—135,200 


Pe a 
ee eee ee) 
ee block, number of blocks) M144k, ~144 kbits, 6-48 
| TotaluserRAMbits ss user een bits 1.8M-16.2M 
dre Slices and ae 
| TotalSRAM | 
[ssomrz | _600Meiz 


Max. SRAM frequenc' 
Number of DSP blocks 32-192 27-96 


Number of multipliers: 
18x25 (Virtex) 
18x18 or 36x36 (Stratix) 216-768 or 54-192 


Clock management: Pe Eis Set 


Clock pins; Max. number clock 20; 32 GCLKs, 96 RCLKs, plus 32; 16 GCLKs, 88 RCLKs, 116 
global, reg., periph. /O clocks PCLKs 


12 me deskew onl 4 (r= aa | interface onl 


Pe a ae 
Number of I/O pins — 1200 | ——_____ 
Supported I/O standards All of section 10.9 plus some All of section 10.9 plus some 


automatic calibration 
(1) Smallest device, with all LABs used, all in low-power mode; no DSP or RAM. 


FIGURE 18.18. Summary of Virtex 5 and Stratix III features. 


18.5 Exercises 
1. PAL versus PLA 


Briefly explain the main differences between PAL and PLA. 


b. Why can a PLA implement more logical functions than a PAL with the same number of 
AND-OR gates? 


c. Why is a PLA slower than a PAL with the same number of AND-OR gates (of similar technology)? 
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2. GAL versus PAL 


Briefly explain the improvements introduced by GAL over the existing PAL and PLA devices. Why 
is GAL fine for implementing sequential circuits while the other two are not? 


3. CPLD versus GAL #1 


Check the data sheets for the Xilinx XC9500 CPLD series and confirm that such CPLDs are 
constructed with n V18-type GALs, where n= 2, 4, 6, 8, 12, 16. 


4. CPLD versus GAL #2 


Check the data sheets for the Altera MAX3000 CPLD series and confirm that such CPLDs are 
constructed with n V16-type GALs, where n=2, 4, 8, 16, 32. 


5. Low-power CPLDs #1 


Check the data sheets of the Xilinx CoolRunner II CPLD and confirm that its static (standby) power 
consumption is much smaller than that of any other Xilinx CPLD (see Figure 18.10). 


6. Low-power CPLDs #2 


Check the data sheets of the Altera MAX II CPLD and confirm that its static (standby) power 
consumption is much smaller than that of any other Altera CPLD (Figure 18.11). 


7. CPLD technology #1 


Check the data sheets for the three Xilinx CPLD families (Figure 18.10) and write down the 
following: 


a. The CMOS technology employed in their fabrication. 
b. The type of memory (EEPROM, flash, etc.) used to store the interconnects. Is it nonvolatile? 
c. The power supply voltage options. 
d. The total number of flip-flops. 

8. CPLD technology #2 
Check the data sheets for the three Altera CPLD families (Figure 18.11) and write down the following: 
a. The CMOS technology employed in their fabrication. 
b. The type of memory (EEPROM, flash, etc.) used to store the interconnects. Is it nonvolatile? 
c. The power supply voltage options. 
d. The total number of flip-flops. 

9. CPLD I/Os #1 


Check the data sheets for the three Xilinx CPLD families (Figure 18.10) and write down the 
following: 


a. The I/O standards supported by each family. Which of them are in Section 10.9? 
b. The number of user I/O pins. 
c. The types of packages (see Figure 1.10). 
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10. 


11. 


12. 


13. 


14. 


CPLD I/Os #2 


Check the data sheets for the three Altera CPLD families (Figure 18.11) and write down the 
following: 


a. The I/O standards supported by each family. Which of them are in Section 10.9? 
b. The number of user I/O pins. 

c. The types of packages (see Figure 1.10). 

FPGA versus CPLD 

Briefly explain the main differences between FPGAs and CPLDs. 

FPGA technology #1 


Check the data sheets for the Xilinx Virtex 5 family of FPGAs (Figure 18.18) and write down the 
following: 


a. The CMOS technology employed in their fabrication. 


b. The type of memory (EEPROM, flash, SRAM, DRAM, etc.) used to store the interconnects. Is it 
volatile or nonvolatile? 


c. The power supply voltage. 

d. The total number of flip-flops. 
e. The number of PLLs. 

f. The amount of user RAM. 
FPGA technology #2 


Check the data sheets for the Altera Stratix III family of FPGAs (Figure 18.18) and write down the 
following: 


a. The CMOS technology employed in their fabrication. 


b. The type of memory (EEPROM, flash, SRAM, DRAM, etc.) used to store the interconnects. Is it 
volatile or nonvolatile? 


c. The power supply voltage. 

d. The total number of flip-flops. 
e. The number of PLLs. 

f. The amount of user RAM. 
FPGA I/Os #1 


Check the data sheets for the Xilinx Virtex 5 family of FPGAs (Figure 18.18) and write down the 
following: 


a. The I/O standards supported by each family. 
b. The number of user I/O pins. 
c. The types of packages (see Figure 1.10). 
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15. 


16. 


17. 


FPGA I/Os #2 


Check the data sheets for the Altera Stratix III family of FPGAs (Figure 18.18) and write down the 
following: 


a. The I/O standards supported by each family. 
b. The number of user I/O pins. 

c. The types of packages (see Figure 1.10). 
Other CPLD manufacturers 


As mentioned earlier, there are several other CPLD manufacturers besides those used as examples 
in this chapter. Look for such manufacturers and briefly compare their devices against those in 
Figures 18.10 and 18.11. 


Other FPGA manufacturers 


As mentioned earlier, there are several other FPGA manufacturers besides those used as examples 
in this chapter. Look for such manufacturers and briefly compare their devices against those in 
Figure 18.18. 
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VHDL Summary 


Objective: This chapter concisely describes the VHDL language and presents some introductory 
circuit synthesis examples. Its purpose is to lay the fundamentals for the many designs that follow in 
Chapters 20 and 21, for combinational circuits, and in Chapters 22 and 23, for sequential circuits. The use 
of VHDL is concluded in Chapter 24, which introduces simulation techniques with VHDL testbenches. 
The descriptions below are very brief; for additional details, books written specifically for VHDL 
([Pedroni04a] and [Ashenden02], for example) should be consulted. 


Chapter Contents 


19.1 About VHDL 

19.2 Code Structure 

19.3 Fundamental VHDL Packages 

19.4 Predefined Data Types 

19.5 User Defined Data Types 

19.6 Operators 

19.7. Attributes 

19.8 Concurrent versus Sequential Code 

19.9 Concurrent Code (WHEN, GENERATE) 
19.10 Sequential Code (IF, CASE, LOOP, WAIT) 
19.11 Objects (CONSTANT, SIGNAL, VARIABLE) 
19.12 Packages 

19.13 Components 

19.14 Functions 

19.15 Procedures 

19.16 VHDL Template for FSMs 

19.17 Exercises 


The summary presented in this chapter can be divided into two parts [Pedroni04a]. The first part, which 
encompasses Sections 19.1 to 19.11, plus Section 19.16, describes the VHDL statements and constructs 
that are intended for the main code, hence referred to as circuit-level design. The second part, covered by 
Sections 19.12 to 19.15, presents the VHDL units that are intended mainly for libraries and code parti- 
tioning, so it is referred to as system-level design. 
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19.1 About VHDL 


VHDL is a technology and vendor independent hardware description language. The code describes the 
behavior or structure of an electronic circuit from which a compliant physical circuit can be inferred by a 
compiler. Its main applications include synthesis of digital circuits onto CPLD/FPGA chips and layout/ 
mask generation for ASIC (application-specific integrated circuit) fabrication. 

VHDL stands for VHSIC hardware description language and resulted from an initiative funded by 
the U.S. Department of Defense in the 1980s. It was the first hardware description language standardized 
by the IEEE, through the 1076 and 1164 standards. 

VHDL allows circuit synthesis as well as circuit simulation. The former is the translation of a source 
code into a hardware structure that implements the specified functionalities; the latter is a testing pro- 
cedure to ensure that such functionalities are achieved by the synthesized circuit. In the descriptions 
that follow, the synthesizable constructs are emphasized, but an introduction to circuit simulation with 
VHDL is also included (Chapter 24). 

The following are examples of EDA (electronic design automation) tools for VHDL synthesis and 
simulation: Quartus II from Altera, ISE from Xilinx, FPGA Advantage, Leonardo Spectrum (synthesis), 
and ModelSim (simulation) from Mentor Graphics, Design Compiler RTL Synthesis from Synopsys, 
Synplify Pro from Synplicity, and Encounter RTL from Cadence. 

All design examples presented in this book were synthesized and simulated using Quartus II Web 
Edition, version 6.1 or higher, available free of charge at www.altera.com. The designs simulated using 
testbenches (Chapter 24) were processed with ModelSim-Altera Web Edition 6.1g, also available free of 
charge at the same site. 


19.2 Code Structure 


This section describes the basic structure of VHDL code, which consists of three parts: library declara- 
tions, entity, and architecture. 


Library declarations 


Library declarations is a list of all libraries and corresponding packages that the compiler will need 
to process the design. Two of them (std and work) are made visible by default. The std library contains 
definitions for the standard data types (BIT, BOOLEAN, INTEGER, BIT, BIT_VECTOR, etc.), plus informa- 
tion for text and file handling, while the work library is simply the location where the design files are 
saved. 

A package that often needs to be included in this list is std_logic_1164, from the ieee library, which 
defines a nine-value logic type called STD_ULOGIC and its resolved subtype, STD_LOGIC (the latter is the 
industry standard). The main advantage of STD_LOGIC over BIT is that it allows high-impedance ("Z') 
and “don’t care” (‘—’) specifications. 

To declare the package above (or any other) two lines of code are needed, one to declare the library and 
the other a use clause pointing to the specific package within it, as illustrated below. 


LIBRARY ieee; 
USE ieee.std_logic_1164.all1; 
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Entity 

Entity is a list with specifications of all I/O ports of the circuit under design. It also allows generic 
parameters to be declared, as well as several other declarations, subprogram bodies, and concurrent 
statements. Its syntax is shown below. 


ENTITY entity_name IS 


GENERIC (constant_name: constant_type := constant_value; 
constant_name: constant_type := constant_value; 
eee 

PORT (port_name: signal_mode signal_type; 

port_name: signal_mode signal_type; 
iad 
[declarative part] 
[BEGIN] 
[statement part] 


END entity_name; 


GENERIC: Allows the declaration of generic constants, which can then be used anywhere in the 
code, including in PORT. Example: GENERIC (number_of_bits: INTEGER := 16). 


PORT, signal mode: IN, OUT, INOUT, BUFFER. The first two are truly unidirectional, the third is bidi- 
rectional, and the fourth is needed when an output signal must be read internally. 


PORT, signal type: BIT, BIT_VECTOR, INTEGER, STD_LOGIC, STD_LOGIC_VECTOR, BOOLEAN, etc. 


m@ Declarative part: Can contain TYPE, SUBTYPE, CONSTANT, SIGNAL, FILE, ALIAS, USE, and ATTRIBUTE 


declarations, plus FUNCTION and PROCEDURE bodies. Rarely used in this way. 


Statement part: Can contain concurrent statements (rarely used). 


Architecture 

This part contains the code proper (the intended circuit’s structural or behavioral description). It can be 
concurrent or sequential. The former is adequate for the design of combinational circuits, while the latter 
can be used for sequential as well as combinational circuits. Its syntax is shown below. 


ARCHITECTURE architecture_name OF entity_name IS 


[declarative part] 


BEGIN 


(code) 


END architecture_name; 


m@ Declarative part: Can contain the same items as the entity’s declarative part, plus COMPONENT and 


CONFIGURATION declarations. 


m Code: Can be concurrent, sequential, or mixed. To be sequential, the statements must be placed 


inside a PROCESS. However, as a whole, VHDL code is always concurrent, meaning that all of its 
parts are treated in “parallel” with no precedence. Consequently, any process is compiled concur- 
rently with any other statements located outside it. The only other option to construct sequential 
VHDL code is by means of subprograms (FUNCTION and PROCEDURE), described in Sections 19.14 
and 19.15. 
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To write purely concurrent code, operators (Section 9.6) can be used as well as the WHEN and GENERATE 
statements. To write sequential code (inside a PROCESS, FUNCTION, or PROCEDURE), the allowed state- 
ments are IF, CASE, LOOP, and WAIT (plus operators). 

The code structure described above is illustrated in the example that follows. 


MM EXAMPLE 19.1 BUFFERED MULTIPLEXER 


This introductory example shows the design of a 4x8 multiplexer (Section 11.6) whose output 
passes through a tri-state buffer (Section 4.8) controlled by an output-enable (ena) signal. The 
circuit is depicted in Figure 19.1(a), and the desired functionality is expressed in the truth table of 
Figure 19.1(b). 


am FF Oo a 


(a) (b) 


FIGURE 19.1. Buffered multiplexer of Example 19.1. 


SOLUTION 


A VHDL code for this example is presented in Figure 19.2. Note that it contains all three code 
sections described above. The additional package is precisely std_logic_1164 (lines 2 and 3), which is 
needed because the high-impedance state ("Z") is employed in this design. 

The entity is in lines 5-10, under the name buffered_mux (any name can be chosen) and con- 
tains six input signals and one output signal (note the modes IN and OUT). Signals a to d are 8-bits 
wide and of type STD_LOGIC_VECTOR, and so is y (otherwise none of the inputs a-d could be 
passed to it); ina tod andy the indexing is from 7 down to 0 (in VHDL, the leftmost bit is the 
MSB). The type of sel was declared as INTEGER, though it could also be STD_LOGIC_VECTOR(1 
DOWNTO 0), among other options. Finally, ena was declared as STD_LOGIC (single bit), but BIT 
would also do. 

The architecture is in lines 12-21, with the name myarch (can be basically any name, including the 
same name as the entity’s). In its declarative part (between the words ARCHITECTURE and BEGIN) 
an internal signal (x) was declared, which plays the role of multiplexer output. The WHEN statement 
(described later) was employed in lines 15-18 to implement the multiplexer and in lines 19 and 20 
to implement the tri-state buffer. Because all eight wires that feed y must go to a high-impedance 
state when ena='0', the keyword OTHERS was employed to avoid repeating 'Z' eight times (that is, 
y<="ZL222272"). 

Note that single quotes are used for single-bit values, while double quotes are used for multibit 
values. Observe also that lines 1, 4, 11, and 22 were included only to improve code organization and 
readability ("--" is used for comments). Finally, VHDL is not case sensitive, but to ease visualization 
capital letters were employed for reserved VHDL words. 
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(lo ween --------- 
Library 2 LIBRARY ieee; . 
declarations ; USE ieee.std_logic 1164.all; 

( 5 ENTITY buffered_mux IS 

6 PORT (a, b, c, d: IN STD LOGIC VECTOR(7 DOWNTO 0); 
7 7 sel: IN INTEGER RANGE 0 TO 3; 
Entity < ena: IN STD LOGIC; 

9 y: OUT STD LOGIC _VECTOR(7 DOWNTO 0)); 
10 END buffered mux; 

SST), ARERR eae ec en eee a ae. 

(12 ARCHITECTURE myarch OF buffered mux IS 
13 SIGNAL x: STD_LOGIC_VECTOR(7 DOWNTO Q); 
14 BEGIN 
15 x <= a WHEN sel=0 ELSE --Mux 

P 16 b WHEN sel=1 ELSE 
Architecture <j 5 c WHEN sel=2 ELSE 

18 d; 
19 y <= x WHEN ena='1' ELSE --Tristate buffer 
20 (OTHERS => 'Z'); 
21 END myarch; 

TD a a ew a le wa 
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33 
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FIGURE 19.3. Simulation results from the circuit (buffered mux) inferred with the code of Figure 19.2. 


Simulation results obtained with the code above are shown in Figure 19.3. Note that the input 
signals are preceded by an arrow with an “I” (Input) marked inside, while the arrow for the output 
signal has an “O” (Output) marked inside. The inputs can be chosen feely, while the output is calcu- 
lated and plotted by the simulator. As can be seen, the circuit does behave as expected. Ml 


19.3 Fundamental VHDL Packages 


The list below shows the main VHDL packages along with their libraries of origin. These packages can 
be found in the libraries that accompany your synthesis /simulation software. 


Library std 


m@ Package standard: Defines the basic VHDL types (BOOLEAN, BIT, BIT_VECTOR, INTEGER, NATURAL, 
POSITIVE, REAL, TIME, DELAY_LENGTH, etc.) and related logical, arithmetic, comparison, and shift 
operators. 
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Library ieee 
m@ Package std_logic_1164: Defines the nine-valued type STD_ULOGIC and its resolved subtype 
STD_LOGIC (industry standard). Only logical operators are included, along with some type- 
conversion functions. 


m Package numeric_std: Defines the numeric types SIGNED and UNSIGNED, having STD_LOGIC as the 
base type. Includes also logical, arithmetic, comparison, and shift operators, plus some type-conversion 
functions. 


Nonstandard packages 


Both libraries above (std, ieee) are standardized by the IEEE. The next packages are common sharewares 
provided by EDA vendors. 


m@ Package std_logic_arith: Defines the numeric types SIGNED and UNSIGNED, having STD_LOGIC as 
the base type. Includes some arithmetic, comparison, and shift operators, plus some type-conversion 
functions. This package is only partially equivalent to numeric_std. 


m Package std_logic_signed: Defines arithmetic, comparison, and some shift operators for the 
STD_LOGIC_VECTOR type as if it were SIGNED. 


m Package std_logic_unsigned: Defines arithmetic, comparison, and some shift operators for the 
STD_LOGIC_VECTOR type as if it were UNSIGNED. 


19.4 Predefined Data Types 


The predefined data types are from the libraries/packages listed above. Those that are synthesiz- 
able are listed in Figure 19.4, which shows their names, library / package of origin, and synthesizable 
values. 


BOOLEAN std / standard TRUE, FALSE 
INTEGER std / standard ~(2°'"-1) to +(2°'-1) 


NATURAL std / standard 0 to +(2°'-1) 
POSITIVE std / standard 1 to +(2°'-1) 


CHARACTER std / standard 256-symbol alphabet (1 byte/symbol) 
STRING std / standard Set of characters 
REAL std / standard Little or no synthesis support 


STD_(U)LOGIC, ieee / std_logic_1164 Input: 'O' or 'L', '1' or 'H" 
STD_(U)LOGIC_VECTOR Output: '0' or 'L’, '1' or 'H'’, ‘~' or 'X', and ‘Z’ 


UNSIGNED, SIGNED ieee / numeric_std, Same as STD_LOGIC 
() / std_logic_arith 


FIGURE 19.4. Predefined synthesizable data types with respective library/package of origin and synthesizable 
values. 
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MM EXAMPLE 19.2 DATA-TYPE USAGE 


Consider the following signal definitions: 


SIGNAL a: BIT; 

SIGNAL b: BIT_VECTOR(7 DOWNTO 0); 
SIGNAL c: BIT_VECTOR(1 TO 16); 

SIGNAL d: STD_LOGIC; 

SIGNAL e: STD_LOGIC_VECTOR(7 DOWNTO 0); 
SIGNAL f: STD_LOGIC_VECTOR(1 TO 16); 
SIGNAL g: INTEGER RANGE -35 TO 35; 
SIGNAL h: INTEGER RANGE O TO 255; 
SIGNAL i: NATURAL RANGE 0 TO 255; 


a. How many bits will the compiler assign to each of these signals? 
b. Explain why the assignments below are legal. 


a<=b(5); 

c(16) <=b(0); 

c(1)<=a; 

b(5 DOWNTO 1)<=c(8 TO 12); 
e(1)<=d; 

e(2)<=f(16); 

f(1 TO 8)<=e(7 DOWNTO 0); 
b<="11110000"; 


a<='1'; 

d<='Z'; 

e<=(0=>'1', 7=>'0', OTHERS=>'Z'); 
e<="0ZZZZZZ1"; --same as above 


c. Explain why the assignments below are illegal. 


a<='Z'; 
a<=d; 

c(16) <=e(0); 

e(5 DOWNTO 1)<=c(8 TO 12); 
f(5 TO 9)<=e(7 DOWNTO 2); 


SOLUTION 

Part (a): 

1 bit for a and d, 7 bits for g, 8 bits for b, e, h, and i, and 16 bits for c and f. 
Part (b): 

All data types and ranges match. 

Part (c): 

There are type and range mismatches. 

a<='"Z'; --'Z' not available for BIT 


a<=d; --type mismatch (BIT x STD_LOGIC) 
c(16)<=e(0); --type mismatch 

e(5 DOWNTO 1)<=c(8 TO 12); --type mismatch 
f(5 TO 9)<=e(7 DOWNTO 2); --range mismatch @ 
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19.5 User Defined Data Types 


Data types can be created using the keyword TYPE. Such declarations can be done in the declarative part 
of ENTITY, ARCHITECTURE, FUNCTION, PROCEDURE, or PACKAGE. Roughly speaking, they can be divided 
into the three categories below. 
Integer-based data types Subsets of INTEGER, declared using the syntax below. 

Syntax: TYPE type_name IS RANGE type_range; 

Examples: 


TYPE bus_size IS RANGE 8 TO 64; 
TYPE temperature IS RANGE -5 TO 120; 


Enumerated data types Employed in the design of finite state machines. 
Syntax: TYPE type_name IS (state_names); 
Examples: 


TYPE machine_state IS (idle, forward, backward); 
TYPE counter IS (zero, one, two, three); 


Array-based data types The keywords TYPE and ARRAY are now needed, as shown in the syntax 
below. They allow the creation of multi-dimensional data sets (1D, 1D x 1D, 2D, and generally also 3D 
are synthesizable without restrictions). 


Syntax: TYPE type_name IS ARRAY (array_range) OF data_type; 
Examples of 1D arrays (single row): 


TYPE vector IS ARRAY (7 DOWNTO 0) OF STD_LOGIC; --1X8 array 
TYPE BIT_VECTOR IS ARRAY (NATURAL RANGE <>) OF BIT; --unconstrained array 


Examples of 1D x 1D arrays (4 rows with 8 elements each): 


TYPE arraylD1D IS ARRAY (1 TO 4) OF BIT_VECTOR(7 DOWNTO 0); --4xX8 array 
TYPE vector_array IS ARRAY (1 TO 4) OF vector; --4xX8 array 


Example of 2D array (8 x 16 matrix): 
TYPE array2D IS ARRAY (1 TO 8, 1 TO 16) OF STD_LOGIC; --8X16 array 


19.6 Operators 


VHDL provides a series of operators that are divided into the six categories below. The last four are 
summarized in Figure 19.5, along with the allowed data types. 


Assignment operators "Za" Mpa Nap" 


"<=" Used to assign values to signals. 
":=" Used to assign values to variables, constants, or initial values. 
"=>" Used with the keyword OTHERS to assign values to arrays. 
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Examples: 

sigl<='1'; --assignment to a single-bit signal 
sig2<="00001111"; --assignment to a multibit signal 
sig3<=(OTHERS=> '0'); --result is sig3<="00...0" 
VARIABLE varl: INTEGER:=0; --assignment of initial value 
var2:="0101"; --assignment to a multibit variable 


Concatenation operators "&" and"," 
These operators are employed to group values. 


Example: The assignments to x, y, and z below are equivalent. 


k: CONSTANT BIT_VECTOR(1 TO 4) :="1100"; 


x<=("Z', k(2 TO 3), "11111"); -- result: x<="Z1011111" 
y<='Z" & k(2 TO 3) & "11111"; -- result: y<="Z1011111" 
Z<=(7=>'Z', 5=>'0", OTHERS=>'1'); -- result: z<="Z1011111" 


Logical operators NOT,AND, NAND, OR, NOR, XOR, XNOR 


The only logical operator with precedence over the others is NOT. 


Examples: 

x<=a NAND b; -- result: x=(a.b)' 
y<=NOT(a AND b); -- result: same as above 
z<=NOT a AND b; -- result: x=a'.b 


Arithmetic operators +, -,*,/,**, ABS, REM, MOD 


These are the classical operators: addition, subtraction, multiplication, division, exponentiation, absolute-value, 
remainder, and modulo. They are also summarized in Figure 19.5 along with the allowed data types. 
Examples: 


X<=(atb)**N; 
y <=ABS(a)+ABS(b); 
z<=a/(atb); 


Shift operators SLL, SRL, SLA, SRA, ROL, ROR 


Shift operators are shown in Figure 19.5 along with the allowed data types. Their meanings are described 
below. 
SLL (shift left logical): Data are shifted to the left with '0's in the empty positions. 
SRL (shift right logical): Data are shifted to the right with '0's in the empty positions. 
SLA (shift left arithmetic): Data are shifted to the left with the rightmost bit in the empty positions. 
SRA (shift right arithmetic): Data are shifted to the right with the leftmost bit in the empty positions. 
ROL (rotate left): Circular shift to the left. 
ROR (rotate right): Circular shift to the right. 


Examples: 


a<="11001"; 
x<=a SLL 2; --result: x<="00100" 
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OR, NOR, XOR, XNOR__| STD_LOGIC_(U) VECTOR, (UN)SIGNED(2) 
ABS, REM, MOD UNSIGNED, REAL(3) 
SRA, ROL, ROR 


BOOLEAN, BIT, BIT_VECTOR, INTEGER, 
NATURAL, POSITIVE, (UN)SIGNED, 
CHARACTER, STRING, REAL(3) 

(1) As defined in the orginal package. (2) Depends on the package. (3) Limited or no synthesis support. (4) Partial set. 


Operator type 
Logical 


Arithmetic 


Comparison 


FIGURE 19.5. Predefined synthesizable operators and allowed predefined data types. 


y<=a SLA 2; --result: y<="00111" 
z<=a SLL -3; --result: z<="00011" 


Comparison operators =, /=,>,<, >=, <= 


Comparison operators are also shown in Figure 19.5 along with the allowed data types. 


Example: 
IF a>=b THEN x<='l'; 


19.7 Attributes 


The main purposes of VHDL attributes are to allow the construction of generic (flexible) codes as well 
as event-driven codes. They also serve for communicating with the synthesis tool to modify synthesis 
directives. 

The predefined VHDL attributes can be divided into the following three categories: (i) range related, 
(ii) position related, and (iii) event related. All three are summarized in Figure 19.6, which also lists the 
allowed data types. 


Range-related attributes Return parameters regarding the range of a data array. 


Example: 


For the signal x specified below, the range-related attributes return the values listed subsequently 
(note that m>n). 


SIGNAL x: BIT_VECTOR(m DOWNTO n); 
x'LOW > n 

x'HIGH > m 

x'LEFT > m 

x'RIGHT > n 

X'LENGTH > m-n+1 

x'RANGE — m DOWNTO n 
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Predefined attributes Supported predefined data types 
Range related ‘LOW, 'HIGH, 'LEFT, 'RIGHT, BIT_VECTOR, STD_LOGIC_(U)VECTOR, 
‘LENGTH, 'RANGE, ‘ASCENDING, | INTEGER, NATURAL, POSITIVE, 
"REVERSE_RANGE (UN)SIGNED 
Position related ‘POS, 'VAL, 'LEFTOF, 'RIGHTOF, Enumerated data types 
"PRED, 'SUCC 
Event related ‘EVENT, 'STABLE, 'LAST_VALUE BIT, STD_(U)LOGIC, BOOLEAN 


FIGURE 19.6. Predefined synthesizable attributes and allowed predefined data types. 


x'REVERSE_RANGE > n TO m 
x'ASCENDING —> FALSE (because the range of x is descending) 


Position-related attributes Return positional information regarding enumerated data types. For 
example, x'POS(value) returns the position of the specified value, while x'VAL(position) returns 
the value in the specified position. These attributes are also included in Figure 19.6. 


Event-related attributes Employed for monitoring signal changes (like clock transitions). The most 
common of these is x' EVENT, which returns TRUE when an event (positive or negative edge) occurs in x. 
The main (synthesizable) attributes in this category are also included in Figure 19.6. 


Other attributes 


GENERIC: This attribute was described in Section 19.2. It allows the specification of arbitrary 
constants in the code’s entity. 


ENUM_ENCODING: This is a very important attribute for state-encoding in finite-state-machine-based 
designs. Its description will be seen in Section 19.16. 


19.8 Concurrent versus Sequential Code 


Contrary to computer programs, which are sequential (serial), VHDL code is inherently concurrent (parallel). 
Therefore, all statements have the same precedence. 

Though this is fine for the design of combinational circuits, it is not for sequential ones. To circumvent 
this limitation, PROCESS, FUNCTION, or PROCEDURE can be used, which are the only pieces of VHDL code 
that are interpreted sequentially. 

In the IEEE 1076 standard, FUNCTION and PROCEDURE are collectively called subprograms. To ease any 
reference to sequential code, we will use the word subprogram in a wider sense, including PROCESS in it 
as well. 

Regarding sequential code, it is important to remember, however, that each subprogram, as a 
whole, is still interpreted concurrently to any other statements or subprograms that the code might 
contain. 

The VHDL statements intended for concurrent code (therefore located outside subprograms) are WHEN 
and GENERATE (plus a less common statement called BLOCK), while those for sequential code (thus 
allowed only inside subprograms) are IF, CASE, LOOP, and WAIT. Operators (seen in Section 19.6) are 
allowed anywhere. 
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19.9 Concurrent Code (WHEN, GENERATE) 


As mentioned above, concurrent code can be constructed with the statements WHEN and GENERATE, plus 
operators. 


The WHEN statement This statement is available in two forms as shown in the syntaxes below. Two 
equivalent examples (a multiplexer) are also depicted. The keyword OTHERS is useful to specify all re- 
maining input values, while the keyword UNAFFECTED (not employed in the examples below) can be 
used when no action is to take place. (You can now go back and inspect Example 19.1.) 


Syntax (WHEN- ELSE) Example 

assignment WHEN conditions ELSE x <= a WHEN sel=0 ELSE 

assignment WHEN conditions ELSE b WHEN sel=1 ELSE 

eee 5 Cc: 

Syntax (WITH-SELECT-WHEN) Example 

WITH identifier SELECT WITH sel SELECT 
assignment WHEN conditions ELSE x <= a WHEN O, 


assignment WHEN conditions ELSE b WHEN 1, 
sand c WHEN OTHERS; 


The GENERATE statement This statement is equivalent to the LOOP statement. However, the latter is 
for sequential code, while the former is for concurrent code. 


Syntax Example 

label: FOR identifier IN range GENERATE Gl: FOR i IN 0 TO M GENERATE 
[declarations b(i) <= a(M-4); 

BEGIN] END GENERATE G1; 


(concurrent assignments) 
END GENERATE [label]; 


MM EXAMPLE 19.3 PARITY DETECTOR 


Parity detectors were studied in Section 11.7 (see Figure 11.20). Design a circuit of this type with a 
generic number of inputs. 


SOLUTION 


AVHDL code for this problem is presented below. Note that this code has only two sections because 
additional libraries / packages are not needed (the data types employed in the design are all from the 
package standard, which is made visible by default). 

The entity, called parity_detector, is in lines 2-6. As requested, N is entered using the GENERIC 
attribute (line 3), so this code can implement any size parity detector (the only change needed is in 
the value of N in line 3). The input is called x (mode IN, type BIT_VECTOR), while the output is called 
y (mode OUT, type BIT). The architecture is in lines 8-16. Note that GENERATE was employed (lines 
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FIGURE 19.7. Simulation results from the code (parity detector) of Example 19.3. 


12-14) along with the logical operator XOR (line 13). An internal signal, called internal, was created 
in line 9 to hold the value of the XOR operations (line 13). Observe that this signal has multiple bits 
because multiple assignments to the same bit are not allowed (if it were a single-bit signal, then 
N assignments to it would occur, that is, one in line 11 and N-1 in line 13). Simulation results are 
depicted in Figure 19.7. 


1 er a a ae as te pe are ea ms ee ne eee urea een ees oes ee 
2 ENTITY parity_detector IS 

3 GENERIC (N: INTEGER:=8); --number of bits 
4 PORT (x: IN BIT_VECTOR(N-1 DOWNTO 0); 

5 y: OUT BIT); 

6 END parity_detector; 

7 Schr snoaacis cmt wate ee ee ee he eee ee ees eee See ee ens 
8 ARCHITECTURE structural OF parity_detector IS 

9 SIGNAL internal: BIT_VECTOR(N-1 DOWNTO 0); 
10 BEGIN 

11 internal (0) <=x(0); 

12 gen: FOR i IN 1 TO N-1 GENERATE 

13 internal(i)<=internal(i-1) XOR x(i); 

14 END GENERATE; 

15 y<=internal(N-1); 

16 END structural; 

life SSS SSE Se Se SS SSeS SS Sess eee Sas eee See o 


19.10 Sequential Code (IF, CASE, LOOP, WAIT) 


As mentioned in Section 19.8, VHDL code is inherently concurrent. To make it sequential, it has 
to be written inside a subprogram (that is, PROCESS, FUNCTION, or PROCEDURE, in our broader 
definition). The first is intended for use in the main code (so it will be seen in this section), while 
the other two are intended mainly for libraries (code-sharing and reusability) and will be seen in 
Sections 19.14 and 19.15. 


PROCESS 


Allows the construction of sequential code in the main code (recall that sequential code can implement 
sequential and combinational circuits). Because its code is sequential, only IF, CASE, LOOP, and WAIT are 
allowed (plus operators, of course). As shown in the syntax below, a process is always accompanied by 
a sensitivity list (except when WAIT is employed); the process runs every time a signal included in the 
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sensitivity list changes (or the condition related to WAIT is fulfilled). In the declarative part (between the 
words PROCESS and BEGIN), variables can be specified. The label is optional. 


[label:] PROCESS (sensitivity list) 
[declarative part] 

BEGIN 
(sequential code) 

END PROCESS [label]; 


Example 


PROCESS (clk) 


BEGIN 
IF clk*EVENT AND clk='1" THEN 
q <= d; 
END IF; 


END PROCESS; 


The IF statement This is the most commonly used of all VHDL statements. Its syntax is shown below. 


IF conditions THEN 
assignments; 
ELSIF conditions THEN 
assignments; 

ELSE 
assignments; 
END IF; 


Example 


IF (x=a AND y=b) THEN 


output <= '0'; 

ELSIF (x=a AND y=c) THEN 
output <= '1'; 

ELSE 
output <= 'Z'; 

END IF; 


The WAIT statement This statement is somewhat similar to IF, with more than one form available. 
Contrary to when IF, CASE, or LOOP are used, the process cannot have a sensitivity list when WAIT is 
employed. The WAIT UNTIL statement accepts only one signal, while WAIT ON accepts several. WAIT 
FOR is for simulations only. All three syntaxes are shown below. 


Syntax (WAIT UNTIL) 


WAIT UNTIL signal_condition; 
Syntax (WAIT ON) 


WAIT ON signall [, signal2, ...]; 
Syntax (WAIT FOR) 


WAIT FOR time; 


WATT UNTIL clk"EVENT AND clk="1'; 


Example 


WAIT ON clk, rst; 


WAIT FOR 5 ns; 


The CASE statement Even though CASE can only be used in sequential code, its fundamental role is to 
allow the easy creation of combinational circuits (more specifically, of truth tables), so it is the sequential 
counterpart of the concurrent statement WHEN. When CASE is employed, all input values must be tested, 
so the keyword OTHERS can be helpful, as shown in the example below (multiplexer). 


CASE identifier IS 
WHEN value => assignments; 
WHEN value => assignments; 


END CASE; 


CASE sel IS 
WHEN 0 => y<=a; 
WHEN 1 => y<=b; 
WHEN OTHERS => y<=c; 
END CASE; 
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The LOOP statement Allows the creation of multiple instances of the same assignments. It is the 
sequential counterpart of the concurrent statement GENERATE. However, as shown below, there are four 
options involving LOOP. 


Syntax (FOR-LOOP) Example 
[label:] FOR identifier IN range LOOP FOR i IN x'RANGE LOOP 
(sequential statements) x(i) <= a(M-i) AND b(i); 
END LOOP [label]; END LOOP; 
Syntax (WHILE-LOOP) Example 
[label:] WHILE condition LOOP WHILE i<M LOOP 
(sequential statements) aes 
END LOOP [label]; END LOOP; 
Syntax (LOOP with EXIT) Example 
LOOP temp := 0; 
tee FOR i IN N-1 DOWNTO O LOOP 
[label:] EXIT [loop_label] EXIT WHEN x(i)='1'; 
[WHEN condition]; temp := temp +1; 
we END LOOP; 
END LOOP; 
Syntax (LOOP with NEXT) Example 
LOOP temp := 0; 
FOR i IN N-1 DOWNTO O LOOP 
[label:] NEXT [loop_label ] NEXT WHEN x(i)='1'; 
[WHEN condition]; temp := temp +1; 
Sista END LOOP; 
END LOOP; 


Note in the example with EXIT above that the code counts the number of leading zeros in an N-bit vector 
(starting with the MSB), while in the example with NEXT it counts the total number of zeros in the vector. 


MM EXAMPLE 19.4 LEADING ZEROS 


As mentioned above, the code presented in the example with EXIT counts the total number of lead- 
ing zeros in an N-bit vector, starting from the left (MSB). Write the complete code for that problem. 


SOLUTION 


The corresponding code is shown below. Again additional library/package declarations are not 
needed. The entity is in lines 2-6, and the name chosen for it is leading_zeros. Note that the number 
of bits was again declared using the GENERIC statement (line 3), thus causing this solution to be fine 
for any vector size. The architecture is in lines 8-20 under the name behavioral. Note that a process 
was needed to use a variable (temp), which, contrary to a signal, does accept multiple assignments 
(lines 13 and 16). The EXIT statement (line 15) was employed to quit the loop when a '1' is found. 
The value of temp (a variable) is eventually passed to y (a signal) at the end of the process (line 18). 
Simulation results are depicted in Figure 19.8. 
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FIGURE 19.8. Simulation results from the code (leading zeros) of Example 19.4. 


1 
2 ENTITY leading_zeros IS 

3 GENERIC (N: INTEGER:=8); 

4 PORT (x: IN BIT_VECTOR(N-1 DOWNTO 0); 
5 y: OUT INTEGER RANGE 0 TO N); 
6 END leading_zeros; 

7 

8 


ARCHITECTURE behavioral OF leading_zeros IS 


9 BEGIN 

10 PROCESS (x) 

11 VARIABLE temp: INTEGER RANGE 0 TO N; 
12 BEGIN 

13 temp :=0; 

14 FOR i IN x'RANGE LOOP 

15 EXIT WHEN x(i)="1'; 

16 temp :=temp+1; 

17 END LOOP; 

18 y<=temp; 

19 END PROCESS; 

20 END behavioral; 

2). HS seCeee Hee Se eee Sere Sheer hes See cee ee |_| 


19.11 Objects (CONSTANT, SIGNAL, VARIABLE) 
There are three kinds of objects in VHDL: CONSTANT, SIGNAL, and VARIABLE. 


CONSTANT A constant can be declared and used basically anywhere (entity, architecture, package, 
component, block, configuration, and subprograms). Its syntax is shown below. (Constants can also be 
declared using the GENERIC statement seen in Section 19.2.2.) 


Syntax: CONSTANT constant_name: constant_type := constant_value; 
Examples: 


CONSTANT number_of_bits: INTEGER:=16; 
CONSTANT mask: BIT_VECTOR(31 DOWNTO 0) :=(OTHERS=>'1'); 
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SIGNAL Signals define circuit I/Os and internal wires. They can be declared in the same places 
as constants, with the exception of subprograms (though they can be used there). When used in a 
subprogram, its value is only updated at the conclusion of the subprogram run. Moreover, it does 
not accept multiple assignments. Its syntax is shown below. The initial value is ignored during 
synthesis. 


Syntax: SIGNAL signal_name: signal_type [range] [:= initial_value]; 


Examples: 


SIGNAL seconds: INTEGER RANGE 0 TO 59; 
SIGNAL enable: BIT; 
SIGNAL my_data: STD_LOGIC_VECTOR(1 TO 8) :="00001111"; 


VARIABLE Variables can only be declared and used in subprograms (PROCESS, FUNCTION, or PRO- 
CEDURE in our broader definition), so it represents only local information (except in the case of shared 
variables). On the other hand, its update is immediate, and multiple assignments are allowed. Its syntax 
is shown below. The initial value is again only for simulations. 


Syntax: VARIABLE variable_name: variable_type [range] [:= initial_value]; 
Examples: 


VARIABLE seconds: INTEGER RANGE 0 TO 59; 
VARIABLE enable: BIT; 
VARIABLE my_data: STD_LOGIC_VECTOR(1 TO 8) :="00001111"; 


SIGNAL versus VARIABLE The distinction between signals and variables and their correct usage are 
fundamental to the writing of efficient (and correct) VHDL code. Their differences are summarized 
in Figure 19.9 by means of six fundamental rules [Pedroni04a] and are illustrated in the example that 
follows. 


SIGNAL VARIABLE 


1. Local of In any VHDL unit, except subprograms Only in subprograms (PROCESS, FUNCTION, or 

declaration PROCEDURE) 

2. Scope Can be global (available to the whole code) | Always local (visible only inside the subprogram), 
except for shared variables 


3. Update New value available only at the end of the Updated immediately (new value can be used in 
subprogram run the next line of code) 


4. Assignment | Values are assigned using “<=" Values are assigned using “:=”" 

operator (example: sig<=5;) (example: var:=5;) 

5. Multiple Only one assignment is allowed Accepts multiple assignments (because update is 

assignments immediate) 

6. Inference of | Flip-flops are inferred when an Flip-flops are inferred when an assignment to a 

registers assignment to a signal occurs at the variable occurs at the transition of another signal 
transition of another signal. and this value is eventually passed to a signal 


FIGURE 19.9. Comparison between SIGNAL and VARIABLE. 
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MM EXAMPLE 19.5 COUNTER (FLIP-FLOP INFERENCE) 


This example illustrates the use of the rules presented in Figure 19.9, particularly rule 6, which 
deals with the inference of flip-flops. Design a 0-to-9 counter, then simulate it and also check the 
number of flip-flops inferred by the compiler. Recall from Section 14.2 that four DFFs are expected 
in this case. 


SOLUTION 


The corresponding VHDL code is shown below. Note the use of rule 6 in lines 12, 13, and 18. Because 
a value is assigned to a variable (temp, lines 13) at the transition of another signal (clk, line 12) and 
that variable’s value is eventually passed to a signal (count, line 18), flip-flops are expected to be 
inferred. Looking at the compilation reports one will find that four registers were inferred. Note also 
the use of other rules from Figure 19.9, like rule 3; the test in line 14 is fine only because the update of 
a variable is immediate, so the value assigned in line 13 is ready for testing in the next line of code. 
Simulation results are displayed in Figure 19.10. 


2 ENTITY counter IS 

3 PORT (clk: IN BIT; 

4 count: OUT INTEGER RANGE 0 TO 9); 
5 END counter; 


6 SHS os Se Re See SSeS SS Ses Seer Ses ere Se, 
7 ARCHITECTURE counter OF counter IS 

8 BEGIN 

9 PROCESS (clk) 

10 VARIABLE temp: INTEGER RANGE 0 TO 10; 
11 BEGIN 

12 IF (clk"EVENT AND clk="1") THEN 

13 temp :=temp+1; 

14 IF (temp=10) THEN 

15 temp :=0; 

16 END IF; 

17 END IF; 

18 count <=temp; 

19 END PROCESS; 

20 END counter; 

21. SertstSis sina leer iee see rine ahi ieee sia eis ela ae SiS 


320.0 ns 480.0 ns 640.0 ns 


FIGURE 19.10. Simulation results from the code (0-to-9 counter) of Example 19.5. Oo 
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Library 


Main code 
PACKAGE 


Declarations of COMPONENT 
SIGNAL, CONSTANT, FUNCTION, 


PROCEDURE, etc. 
ENTITY 
PACKAGE BODY 


Body of FUNCTION and 
PROCEDURE 


Library declarations 


ARCHITECTURE 


Other designs 
(instantiated using COMPONENT) 


FIGURE 19.11. Relationship between the main code and the units intended mainly for system-level design 
(located in libraries). 


19.12 Packages 


We have concluded the description of the VHDL units that are intended for the main code, and we 
turn now to those that are intended mainly for libraries (system-level design); these are PACKAGE, 
COMPONENT, FUNCTION, and PROCEDURE. The relationship between them and the main code is illustrated 
in Figure 19.11 [Pedroni04a]. 

A package can be used for two purposes: (i) to make declarations and (ii) to describe global functions 
and procedures. To construct it, two sections of code might be needed, called PACKAGE and PACKAGE 
BODY (see syntax below). The former contains only declarations, while the latter is needed when a func- 
tion or procedure is declared in the former, in which case it must contain the full description (body) of 
the declared subprogram(s). The two parts must have the same name. 


PACKAGE package_name IS 
(declarations) 
END package_name; 


[PACKAGE BODY package_name IS 
(FUNCTION and PROCEDURE descriptions) 
END package_name; ] 


MM EXAMPLE 19.6 PACKAGE WITH A FUNCTION 


The PACKAGE below (called my_package) contains three declarations (one constant, one type, and one 
function). Because a function declaration is present, a PACKAGE BODY is needed, in which the whole 
function is described (details on how to write functions will be seen in Section 19.14). 


2 PACKAGE my_package IS 

3 CONSTANT carry: BIT:='1'; 

4 TYPE machine_state IS (idle, forward, backward); 

5 FUNCTION convert_integer (SIGNAL S: BIT_VECTOR) RETURN INTEGER; 
6 END my_package; 
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7 Be Sh eee et re tee eas rh EE es nes et tr ei ae ts a a a re eel ee ee ar resem ae, a 
8 PACKAGE BODY my_package IS 

9 FUNCTION convert_integer (SIGNAL S: BIT_VECTOR) RETURN INTEGER IS 
10 BEGIN 

11 ...(function body)... 

12 END convert_integer; 

13 END my_package; 

TEAS SSeS SS ie Pasar S eke Se sear Spt teenie aie StS ae Sire ie SS eS eee |_| 


19.13 Components 


COMPONENT is simply a piece of conventional code (that is, library declarations, entity, and architecture). 
However, the declaration of a code as a component allows reusability and also the construction of 
hierarchical designs. Commonly used digital subsystems (adders, multipliers, multiplexers, etc.) are often 
compiled using this technique. 

A component can be instantiated in an ARCHITECTURE, PACKAGE, GENERATE, or BLOCK. Its syntax 
contains two parts, one for the declaration and another for the instantiation, as shown below. 


COMPONENT component_name IS 
PORT (port_name: signal_mode signal_type; 
port_name: signal_mode signal_type; 
eve) 
END COMPONENT; 
label: [COMPONENT] component_name PORT MAP (port list); 


As can be seen above, a COMPONENT declaration is similar to an ENTITY declaration. The second part 
(component instantiation) requires a label, followed by the (optional) word COMPONENT, then the com- 
ponent’s name, and finally a PORT MAP declaration, which is simply a list relating the ports of the actual 
circuit to the ports of the predesigned component that is being instantiated. This mapping can be posi- 
tional or nominal, as illustrated below. 


Seer Rens Component. declarations ==s-sssosss tease ee Steps Sees 
COMPONENT nand_gate IS 

PORT (a, b: IN BIT; c: OUT BIT); 
END COMPONENT; 


SS SEIS S Component: instantiations: -ssSesss sees ety ee eee tin seem seers 
nandl: nand_gate PORT MAP (x, y, Z); --positional mapping 
nand2: nand_gate PORT MAP (a=>x, b=>y, c=>z); --nominal mapping 


The two traditional ways of using a component (which, as seen above, is a conventional piece of 
VHDL code that has been compiled into a certain library) are: 


i. With the component declared in a package (also located in a library) and instantiated in the main 
code; 


ii. With the component declared and instantiated in the main code. 


An example using method (ii) is shown below. 
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MM EXAMPLE 19.7 CARRY-RIPPLE ADDER WITH COMPONENT 


Adders were studied in Sections 12.2—12.4. The carry-ripple adder of Figure 12.3(b) was repeated in 
Figure 19.12 with a generic number of stages (N). Design this adder using COMPONENT to instantiate 
the N full-adder units. 


FIGURE 19.12. Carry-ripple adder of Example 19.7. 


SOLUTION 


AVHDL code for this problem is shown below. Note that the full-adder unit, which will be instanti- 
ated using the keyword COMPONENT in the main code, was designed separately (it is a conventional 
piece of VHDL code). In this example, the component was declared in the main code (in the archi- 
tecture’s declarative part, lines 13-15; note that it is simply a copy of the component's entity). The 
instantiation occurs in line 20. Because N is generic, the GENERATE statement was employed to create 
multiple instances of the component. The label chosen for the component is FA, and the mapping is 
positional. Simulation results are shown in Figure 19.13. 


|) eke so The component: -s2+2seses5 ssc rese eset ees 
2 ENTITY full_adder IS 

3 PORT (a, b, cin: IN BIT; 

4 s, cout: OUT BIT); 

5 END full_adder; 


6 iets eee es eee) eae a Sieve cahehEne Ss eves auc.a Seis eee Sen ae eee ee 
7 ARCHITECTURE full_adder OF full_adder IS 

8 BEGIN 

9 s<=a XOR b XOR cin; 

10 cout<=(a AND b) OR (a AND cin) OR (b AND cin); 
11 END full_adder; 

12. sees presse se ses pete eae h ee or Seeeee eee se sees eee 
Il Geter ens Malin COdes: ==S2ecses=s senses s masa ane ness 


2 ENTITY carry_ripple_adder IS 

3 GENERIC (N : INTEGER:=8); --number of bits 
4 PORT (a, b: IN BIT_VECTOR(N-1 DOWNTO 0); 

5 cin: IN BIT; 

6 s: OUT BIT_VECTOR(N-1 DOWNTO 0); 

7 cout: OUT BIT); 

8 END carry_ripple_adder; 
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9 eke Gees ete eee bh eee Se es Sete See eee See Stee eee eee: 

10 ARCHITECTURE structural OF carry_ripple_adder IS 

11 SIGNAL carry: BIT_VECTOR(N DOWNTO 0); 

UA he tS Sie ae eS tite ee a Sarat te ae isla te aie al 

13 COMPONENT full_adder IS 

14 PORT (a, b, cin: IN BIT; s, cout: OUT BIT); 

15 END COMPONENT; 

UG 0 Sees See eles Sa ee ee ee ee 

17. BEGIN 

18 carry(0)<=cin; 

19 gen_adder: FOR i IN a'RANGE GENERATE 

20 FA: full_adder PORT MAP (a(i), b(i), carry(i), s(i), carry(itl)); 
21 END GENERATE; 

22 cout<=carry(N); 

23 END structural; 

2A) Sos See Stes eS Re Bias Pe SiS Se Se SHS See SiS Pe See Se eS Se Seer eee 
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FIGURE 19.13. Simulation results from the code (carry-ripple adder) of Example 19.7. O 


GENERIC MAP Completely generic code (for libraries) can be attained using the GENERIC 
attribute (Section 19.2). When a component containing such an attribute is instantiated, the value 
originally given to the generic parameter can be overwritten by including a GENERIC MAP decla- 
ration in the component instantiation. The new syntax for the component instantiation is shown 
below. 


label: comp_name [COMPONENT] GENERIC MAP (parameter list) PORT MAP (port list); 


Example: 


Le55= Component declaration: ------------------------ 
COMPONENT xor_gate IS 

GENERIC (N: INTEGER:=8); 

PORT (inp: IN BIT_VECTOR(1 TO N); outp: OUT BIT); 
END COMPONENT; 


= Sate Component instantiation 922 ss seecese see scese eee sine Behe SaaS Se Ss 
gatel: xor_gate GENERIC MAP (N=>16) PORT MAP (inp=>x, outp=>y); --Nominal map. 
gate2: xor_gate GENERIC MAP (16) PORT MAP (x, y); --Positional mapping 
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19.14 Functions 


PROCESS, FUNCTION, and PROCEDURE are the three types of VHDL subprograms (in our broader 
definition). The first is intended mainly for the main code, while the other two are generally located in 
packages/libraries (for reusability and code sharing). 

The code inside a function is sequential, so only the sequential VHDL statements (IF, CASE, LOOP, and 
WAIT) can be used. However, WAIT is generally not supported in functions. Other prohibitions are signal 
declarations and component instantiations. The syntax of FUNCTION is shown below. 


FUNCTION function_name [parameters] RETURN data_type IS 
[declarative part] 

BEGIN 
(sequential code) 

END function_name; 


Zero or more parameters can be passed to a function. However, it must always return a single value. 
The parameters, when passed, can be only CONSTANT (default) or SIGNAL (VARIABLE is not allowed), 
declared in the following way: 


[CONSTANT] constant_name: constant_type; 
SIGNAL signal_name: signal_type; 


A function can be called basically anywhere (in combinational as well as sequential code, inside 
subprograms, etc.). Its construction (using the syntax above), on the other hand, can be done in the fol- 
lowing places: (i) in a package, (ii) in the declarative part of an entity, (iii) in the declarative part of an 
architecture, (iv) in the declarative part of a subprogram. Because of reusability and code sharing, option 
(i) is by far the most popular (illustrated in the example below). 


MM EXAMPLE 19.8 FUNCTION SH/FT_INTEGER 


Write a function capable of logically shifting an INTEGER to the left or to the right. Two sig- 
nals should be passed to the function, called input and shift, where the former is the signal to 
be shifted, while the latter is the desired amount of shift. If shift>0, then the vector should be 
shifted shift positions to the left; otherwise, if shift <0, then the vector should be shifted |shift| 
positions to the right. Test your function with it located ina PACKAGE (plus PACKAGE BODY, of 
course). 


SOLUTION 


A VHDL code for this problem is shown below. The function (called shift_integer) is declared 
in line 3 of a PACKAGE (called my_package) and constructed in lines 7-10 of the respective 
PACKAGE BODY. The inputs to the function are signals a and b, both of type INTEGER, which 
also returns an INTEGER. Note that only one line of actual code (line 9) is needed to create the 
desired shifts. 

The main code is also shown below. Note that the package described above must be included 
in the library declarations portion of the main code (see line 2). As can be seen in the entity, 6-bit 
signals were employed to illustrate the function operation. A call is made in line 12 with input and 
shift passed to the function, which returns a value for output (observe the operation of this code in 
Figure 19.14). 
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G4 output 


LSS sssese Packages Sssese ssSse eee oe sree SSeS ES Se Seite SSeS sees SS 
2 PACKAGE my_package IS 

3 FUNCTION shift_integer (SIGNAL a, b: INTEGER) RETURN INTEGER; 

4 END my_package; 


6 PACKAGE BODY my_package IS 

7 FUNCTION shift_integer (SIGNAL a, b: INTEGER) RETURN INTEGER IS 
8 BEGIN 

9 RETURN a*(2**b); 

10 END shift_integer; 

11 END my_package; 


|, SERRA Ree Sina Sa eS Mai Odes: s=R=cesSScea pss nS seem SSseeeseaeaS 
2 USE work.my_package.all; 


4 ENTITY shifter IS 

5 PORT (input: IN INTEGER RANGE 0 TO 63; 

6 shift: IN INTEGER RANGE -6 TO 6; 

7 output: OUT INTEGER RANGE 0 TO 63); 
8 END shifter; 


10 ARCHITECTURE shifter OF shifter IS 

11 BEGIN 

12 output<=shift_integer(input, shift); 
13 END shifter; 


19.15 Procedures 


The purpose, construction, and usage of PROCEDURE are similar to those of FUNCTION. Its syntax is 
shown below. 


PROCEDURE procedure_name [parameters] IS 
[declarative part] 

BEGIN 
(sequential code) 

END procedure_name; 


The parameters in the syntax above can contain CONSTANT, SIGNAL, or VARIABLE, accompanied by 
their respective mode, which can be only IN, OUT, or INOUT. Their full specification is as follows. 
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CONSTANT constant_name: constant_mode constant_type; 
SIGNAL signal_name: signal_mode signal_type; 
VARIABLE variable_name: variable_mode variable_type; 


The fundamental differences between FUNCTION and PROCEDURE are the following: (i) a procedure 
can return more than one value, whereas a function must return exactly one; (ii) variables can be passed 
to procedures, which are forbidden for functions; and (iii) while a function is called as part of an expres- 
sion, a procedure call is a statement on its own. 


MM EXAMPLE 19.9 PROCEDURE SORT_DATA 


Write a procedure (called sort_data) that sorts two signed decimal values. Test it with the procedure 
located in a PACKAGE. 


SOLUTION 


A VHDL code for this problem is shown below. As requested, the procedure (sort_data) is located in 
a package (called my_package). This procedure (which returns two values) is declared in lines 3-4 of a 
PACKAGE and is constructed in lines 8-18 of the corresponding PACKAGE BODY. In the main code, note 
the inclusion of my_package in line 2. Observe also in line 11 that the procedure call is a statement on 
its own. Simulation results are depicted in Figure 19.15. 
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FIGURE 19.15. Simulation results from the code (procedure sort_data) of Example 19.9. 


TL fesesees Packages ===2>2+esoeres hes sot rene msec 
2 PACKAGE my_package IS 

3 PROCEDURE sort_data (SIGNAL inl, in2: IN INTEGER; 
4 SIGNAL outl, out2: OUT INTEGER); 

5 END my_package; 


6 Sierra area ey Severe. ates are) are area a ae arava as ee eens 2 een eerie 
7 PACKAGE BODY my_package IS 

8 PROCEDURE sort_data (SIGNAL inl, in2: IN INTEGER; 
9 SIGNAL outl, out2: OUT INTEGER) IS 

10 BEGIN 

11 IF (inl<in2) THEN 

12 outl <=inl; 

13 out2<=in2; 

14 ELSE 

15 outl <=in2; 

16 out2<=inl; 


17 END IF; 
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18 END sort_data; 


19 END my_package; 
200 He Se SSeS eee Tete Sr eee Se Sea eee See eee eS ee 


Main code: 
USE work.my_package.all; 


ENTITY sorter IS 


IN INTEGER RANGE -128 TO 127; 
x, y: OUT INTEGER RANGE -128 TO 127); 
END sorter; 


ARCHITECTURE sorter OF sorter IS 
10 BEGIN 
11 sort_data (a, b, x, y); 
12 END sorter; 


1 
2 
3 
4 
5 PORT (a, 
6 
7 
8 
9 
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19.16 VHDL Template for FSMs 


As seen in Chapter 15, there are two fundamental aspects that characterize an FSM, one related to its 
specifications and the other related to its physical structure. The specifications are normally translated 
by means of a state transition diagram, like that in Figure 19.16(a), which says that the machine has 
three states (A, B, and C), one output (y), and one input (x). Regarding the hardware, it can be mod- 
eled as in Figure 19.16(b), which shows the system split into two sections, one sequential (contains 
the flip-flops) and one combinational (contains the combinational circuits). The signal presently stored 
in the DFFs is called pr_state, while that to be stored at the next (positive) clock transition is called 


nx_state. 


(a) 


input Combinational output 


logic 


pr_state nx_state 


Sequential 
logic clock 


(b) 


FIGURE 19.16. 


(a) Example of state transition diagram; (b) Simplified FSM model (for the hardware). 
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A VHDL template, resembling the diagram of Figure 19.16(b), is shown below [Pedroni04a]. First, 
observe the block formed by lines 12-15. An enumerated data type is created in line 12, which contains 
all machine states; next, the (optional) ENUM_ENCODING attribute is declared in lines 13-14, which allows 
the user to choose the encoding scheme for the machine states (explained shortly); finally, a signal whose 
type is that defined in line 12 is declared in line 15. Observe now that the code proper has three parts. 
The first part (lines 18—25) creates the lower section of the FSM, which contains the flip-flops, so the clock 
is connected to it and a process is needed. The second part (lines 27-44) creates the upper section of the 
FSM; because it is combinational, the clock is not employed and the code can be concurrent or sequential 
(because sequential code allows the construction of both types of circuits); the latter was employed in 
the template, and CASE was used. Finally, the third part (lines 46-51) is optional; it can be used to store 
(“clean”) the output when it is subject to glitches but glitches are not acceptable in the design (like in 
signal generators). 


2 LIBRARY ieee; 

3. USE ieee.std_logic_1164.all1; 

4 Ste etc ecenscsnec noes tee eee beets eee eee eee eee ee eee. 
5 ENTITY <entity_name> IS 

6 PORT (input: IN <data_type>; 

7 clock, reset: IN STD_LOGIC; 

8 output: OUT <data_type>); 

9 END <entity_name>; 


IO SSeese SSS Se Sa See See SS se Se See eee ee ee a ee eee 
11 ARCHITECTURE <arch_name> OF <entity_name> IS 
12 TYPE state IS (A, B, C, ...); 

13 LATTRIBUTE ENUM_ENCODING: STRING; 

14 ATTRIBUTE ENUM_ENCODING OF state: TYPE IS "sequential"; ] 
15 SIGNAL pr_state, nx_state: state; 

16 BEGIN 

17. °( seseeeeeees Lower section: ----------- 

18 PROCESS (reset, clock) 

19 BEGIN 

20 IF (reset='1') THEN 

21 pr_state<=A; 

22 ELSIF (clock'EVENT AND clock='1") THEN 
23 pr_state<=nx_state; 

24 END IF; 

25 END PROCESS; 

20.09 - Se seers Upper section: ----------- 

27 PROCESS (input, pr_state) 

28 BEGIN 

29 CASE pr_state IS 

30 WHEN A => 

31 IF (input=<value>) THEN 

32 output <=<value>; 

33 nx_state<=B; 

34 ELSIE: suns 

35 END IF; 

36 WHEN B => 


37 IF (input=<value>) THEN 
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38 output <=<value>; 

39 nx_state<=C; 

40 ELSE: en. 

41 END IF; 

42 WHEN ... 

43 END CASE; 

44 END PROCESS; 

45 0 ees Output section (optional): ------- 
46 PROCESS (clock) 

47 BEGIN 

48 IF (clock'EVENT AND clock='1"') THEN 
49 new_output <=old_output; 

50 END IF; 

51 END PROCESS; 

52 END <arch_name>; 

Dio! S65 ag ee eg ee is ee aa ee a eee ae a 


ENUM_ENCODING ‘This attribute allows the user to choose the encoding scheme for any enumerated 
data type. Its set of options includes the following (see Section 15.9): 


m@ Sequential binary encoding (regular binary code, Section 2.1) 
m@ Gray encoding (see Section 2.3) 

@ One-hot encoding (codewords with only one '1') 

7 


Default encoding (defined by the compiler, generally sequential binary or one-hot or a combination 
of these) 


m User-defined encoding (any other encoding) 
Example: 


TYPE state IS (A, B, C); 
ATTRIBUTE ENUM _ENCODING: STRING; 
ATTRIBUTE ENUM _ENCODING OF state: TYPE IS "11 00 10"; 


The following encoding results in this example: A="11", B="00", C="10". 
Example: 


TYPE state IS (red, green, blue, white, black); 
ATTRIBUTE ENUM_ENCODING: STRING; 
ATTRIBUTE ENUM_ENCODING OF state: TYPE IS “sequential”; 


The following encoding results in this case: red="000", green="001", blue="010", white="011", 
black="100". 


Note: When using Quartus II, the ENUM_ENCODING attribute causes the State Machine Viewer to be turned 
off. To keep it on and still choose the encoding style, instead of employing the attribute above, set up the com- 
piler using Assignments > Settings > Analysis & Synthesis Settings > More Settings and choose “minimal 
bits” to have an encoding similar to “sequential” or “one-hot” for one-hot encoding (the latter is Quartus II 
default). This solution, however, is not portable and does not allow encodings like “gray,” for example. 
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MM EXAMPLE 19.10 BASIC STATE MACHINE 


The code below implements the FSM of Figure 19.16(a). As can be seen, it is a straightforward 
application of the VHDL template described above. Note that the enumerated data type is called 
state (line 8) and that the optional attribute ENUM_ENCODING was employed (lines 9-10) to guarantee 
that the states are represented using sequential (regular) binary code. 


1 Sete ce eo acan ccna ste een ccs icone eset See e me ec s Be eee Ss Bee 
2 ENTITY fsm IS 

3 PORT (x, clk: IN BIT; 

4 y: OUT BIT); 

5 END fsm; 

6 a 
7 ARCHITECTURE fsm OF fsm IS 

8 TYPE state IS (A, B, C); 

9 ATTRIBUTE ENUM_ENCODING: STRING; 

10 ATTRIBUTE ENUM_ENCODING OF state: TYPE IS “sequential”; 
11 SIGNAL pr_state, nx_state: state; 

12 BEGIN 

13 | See Sears Lower section: -------- 

14 PROCESS (clk) 

15 BEGIN 

16 IF (clk'EVENT AND clk='"1') THEN 

17 pr_state<=nx_state; 

18 END IF; 

19 END PROCESS; 

2000 eee Upper section: -------- 

21 PROCESS (x, pr_state) 

22 BEGIN 

23 CASE pr_state IS 

24 WHEN A => 

25 y<='0'; 

26 IF (x='0') THEN nx_state<=B; 
27 ELSE nx_state<=A; 

28 END IF; 

29 WHEN B => 

30 y<='0'; 

31 IF (x='1"') THEN nx_state<=C; 
32 ELSE nx_state<=B; 

33 END IF; 

34 WHEN C => 

35 y<=NOT x; 

36 nx_state<=A; 

37 END CASE; 

38 END PROCESS; 

39 END fsm; 


AQ ~- 2-22 enn enn ene ee een eee een en ee eee ee eee eee eee eee a 
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19.17 Exercises 


1. VHDL packages 


Examine in the libraries that accompany your synthesis software the packages listed in Section 19.3. 
Write down at least the following: 


a. Data types and operators defined in the package standard. 
b. Data types and operators (if any) defined in the package std_logic_1164. 
c. Data types and operators (if any) defined in the package numeric_std. 
d. Data types and operators defined in the package std_logic_arith. 

2. Buffered multiplexer 


Redo the design in Example 19.1, but for a generic number of bits for a, b, c,d, and y (enter N using 
the GENERIC statement). 


3. Data-type usage 
Consider the following VHDL objects: 


SIGNAL xl: BIT; 

SIGNAL x2: BIT_VECTOR(7 DOWNTO 0); 
SIGNAL x3: STD_LOGIC; 

SIGNAL x4: STD_LOGIC_VECTOR(7 DOWNTO 0); 
SIGNAL x5: INTEGER RANGE -35 TO 35; 
VARIABLE yl: BIT_VECTOR(7 DOWNTO 0); 
VARIABLE y2: INTEGER RANGE -35 TO 35; 


a. Why are the statements below legal? 


x2(7) <=x1; 
x3 <=x4(0); 


y2:=35; 
yl :="11110000"; 


b. And why are these illegal? 
x1(0) <=x2(0); 


x3 <=x1; 

x2<=(OTHERS=>'Z'); 

y2<=-35; 

X3:='Z'; 

x2(7 DOWNTO 5)<=y1(3 DONWTO 0); 
y1l(7)<='1'; 


4. Logical operators 


Suppose that a="11110001", b="11000000", and c="00000011". Determine the values of x, y, and z. 
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10. 


a. x<=NOT (a XOR b) 
b. y<=a XNOR Db 
c. z<=(a AND NOT b) OR (NOT a AND c) 


. Shift operators 


Determine the result of each shift operation below. 
a. "11110001" SLL 3 
b. "11110001" SLA 2 
ec. "10001000" SRA 2 
d. "11100000" ROR 4 


. Parity detector 


Redo the design of Example 19.3, but use sequential instead of concurrent code (that is, a PROCESS 
with LOOP). 


. Hamming weight (HW) 


The HW of a vector is the number of '1's in it. Design a circuit that computes the HW of a generic- 
length vector (use GENERIC to enter the number of bits, N). 


. Flip-flop inference 


a. Briefly describe the main differences between SIGNAL and VARIABLE. When are flip-flops inferred? 


b. Write a code from which flip-flops are guaranteed to be inferred, then check the results in the 
compilation report. 


c. Ifthe counter of Example 19.5 were a 0-to-999 counter, how many flip-flops would be required? 


d. Modify the code of Example 19.5 for it to be a 0-to-999 counter, then compile and simulate it, 
finally checking whether the actual number of registers matches your prediction. 


. Shift register 


As seen in Section 14.1, a shift register (SR) is simply a string of serially connected flip-flops commanded 
by a common clock signal and, optionally, also by a reset signal. Design the SR of Figure 14.2(a) using 
the COMPONENT construct to instantiate N=4 DFPs. Can you make your code generic (arbitrary N)? 


Function add_bitvector 


Arithmetic operators, as defined in the original packages, do not support the type BIT_VECTOR (see 
Figure 19.5). Write a function (called add_bitvector) capable of adding two BIT_VECTOR signals and 
returning a signal of the same type. Develop two solutions: 


a. With the function located in a package. 


b. With the function located in the main code. 
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Logic Circuits 


Objective: Combinational circuits were studied in Chapters 11 and 12, with logic circuits in the 
former and arithmetic circuits in the latter. The same division is made in the VHDL examples, with combi- 
national logic designs presented in this chapter and combinational arithmetic designs illustrated in the next. 
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20.1 Generic Address Decoder 


Address decoders were studied in Section 11.5. We illustrate now the design of the address decoder 
of Figure 20.1 (borrowed from Figure 11.7) in the following two situations: 


mg With N=3 using the WHEN statement (for N=3, the truth table is shown in Figure 20.1). 
@ Still using WHEN, but for arbitrary size (generic N). 


Code for N=3 using the WHEN statement 


A VHDL code for this problem is shown below. As mentioned in Section 19.2, three sections of code are 
necessary. However, the first (library declarations) was omitted because only the standard libraries are 
employed in this example, and these are made visible automatically. 

The second section of code (ENTITY) is responsible for defining the circuit’s I/O ports (pins) and 
appears in lines 2-5 under the name address_decoder; it declares x as a 3-bit input and y as an 8-bit output, 
both of type BIT_VECTOR. 

The third section of code (ARCHITECTURE) is responsible for the code proper (circuit structure or 
behavior) and appears in lines 7-17 with the same name as the entity’s (it can be basically any name). 
The concurrent statement WHEN, seen in Section 19.9, was employed to implement the circuit (lines 9-16). 
Note that this solution is awkward because the code grows with N. 
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Yo 000 00000001 
yi 001 00000010 
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00100000 
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10000000 


FIGURE 20.1. Address decoder. The truth table is for the case of N=3. 


The dashed lines (lines 1, 6, and 18) were employed only to improve code organization and readability. 
Note also that x and y could have been declared as INTEGER instead of BIT_VECTOR. 


] 1SeapeearieSe == Code for N=3 and WHEN:-------------- 
2 ENTITY address_decoder IS 

3 PORT (x: IN BIT_VECTOR(2 DOWNTO 0); 

4 y: OUT BIT_VECTOR(7 DOWNTO 0)); 

5 END address_decoder; 


6 SSS ee es Stee See ates See eee She eee So Se ee ates Gece Gre ee ee 
7 ARCHITECTURE address_decoder OF address_decoder IS 
8 BEGIN 

9 y<="00000001" WHEN x="000" ELSE 

10 "00000010" WHEN x="001" ELSE 

11 "00000100" WHEN x="010" ELSE 

12 "00001000" WHEN x="011" ELSE 

13 "00010000" WHEN x="100" ELSE 

14 "00100000" WHEN x="101" ELSE 

15 "01000000" WHEN x="110" ELSE 

16 "10000000"; 

17 END address_decoder; 

A setae eta ieee ec emt ae teenie aoe Stowe Sai See pe SS 


Simulation results from the code above are presented in Figure 20.2. The upper graph shows only the 
grouped values of x and y, while the second graph shows also the individual bits of x and y. In all other 
simulations that follow, only the former type of presentation will be exhibited (more compact). As can be 
observed, the circuit does operate as expected (x ranges from 0 to 7, while y has only one bit high—note 
that the values of y are all powers of 2). 


Code for arbitrary N, still with the WHEN statement 


The corresponding VHDL code is presented below where x is now declared as INTEGER (line 4). Note that 
the size of the code is now fixed, regardless of N, whose value is entered using the GENERIC statement 
(line 3). Therefore, by just changing the value of N in that line, any address decoder can be obtained. The 
GENERATE statement (lines 10-12) was employed to create 2’ instances of y (that is, from y(0) to y(2—1)), 
whose indexes were copied from x with the x" RANGE attribute (line 10). The label (mandatory) chosen 
for this GENERATE was gen. For N=3, the simulation results are obviously the same as those seen in Fig- 
ure 20.2. 


lee eesti Code for arbitrary Ns s=ss=ssse55ss=55= 
2 ENTITY address_decoder IS 
3 GENERIC (N: INTEGER:=3); --can be any value 
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FIGURE 20.2. Simulation results from the address decoder of Figure 20.1, with N=3. The upper graph shows 
only the grouped values of x and y, while the second graph shows also the individual bits of x and y (only the 
former type of presentation will be exhibited in future examples). 


4 PORT (x: IN INTEGER RANGE O TO 2**N-1; 

5 y: OUT BIT_VECTOR(2**N-1 DOWNTO 0)); 

6 END address_decoder; 

7 Set Seis Spates Sete See She che Sate Sire Beri e Sk ee Sees Ses Sears 
8 ARCHITECTURE address_decoder OF address_decoder IS 
9 BEGIN 

10 gen: FOR i IN x"RANGE GENERATE 

11 y(i)<='1' WHEN i=x ELSE '0'; 

12 END GENERATE; 

13 END address_decoder; 

[4 scectelereteetiese see ae Scie d eS Smee esis sees Se se 


20.2 BCD-to-SSD Conversion Function 


A circuit that converts 4-bit BCD (binary-coded decimal) vectors into 7-bit SSD (seven-segment display) 
vectors was described and designed in Example 11.4, with part of it repeated in Figure 20.3 below. The 
truth table assumes that they are common-cathode SSDs (see Figure 11.13). 

We show now the same design but using VHDL. However, the purpose of this exercise is to 
illustrate the construction of functions, so a function will first be designed and then called in the 
main code. 
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Input(3:0) decimal output(6:0) decimal 


0000 0 1111110 126 
0001 1 0110000 48 
0010 2 1101101 109 
0011 3 1111001 121 

4 0110011 51 


1011011 91 
1011111 95 
1110000 112 


input ges nico er 1000 8 4414411127 
9 


7 1001 1111011 123 
others 10-15 don't care 


FIGURE 20.3. BCD-to-SSD converter. 


As seen in Section 19.14, VHDL functions can be located in several places, like the main code itself (in 
the declarative part of the architecture, for example) or a separate package. Because the latter is more 
commonly used (for code sharing and reusability), that is the option chosen in this example. 

A PACKAGE (lines 2-4), called my_functions, was created to install the FUNCTION named bcd_to_ssd 
(line 3). Because a PACKAGE containing a FUNCTION (or PROCEDURE) declaration must be accompanied 
by a PACKAGE BODY, in which the function body is located, such as was done in lines 6-26. Moreover, 
because FUNCTION is a subprogram (therefore sequential), only sequential VHDL statements (IF, CASE, 
LOOP, WAIT) can be used in it, with CASE chosen in this example (as seen in Section 19.10, the main 
purpose of CASE is the creation of combinational circuits; more specifically, of LUTs). To simplify the 
analysis of the simulation results, the decimal values corresponding to each SSD digit were also included 
in lines 11-21. Note that the case in line 21 (which displays the letter "E") is for error detection. 

In this example, the main code contains only one line of code proper (line 11) in which a call to the 
bcd_to_ssd function is made. Note that a USE clause was included in line 2 to make the package contain- 
ing the function visible to the design. Corresponding simulation results are displayed in Figure 20.4 
where the decimal values listed in the code for the function can be observed. 


| sistas rieee eRe ease Packages. 2 sssteereecists «mie eaeutie stint es isles hers 
2 PACKAGE my_functions IS 

3 FUNCTION bed_to_ssd (SIGNAL input: INTEGER) RETURN BIT_VECTOR; 

4 END my_functions; 

5 2 Syeve ayes ey aye aoe See ee Sy Sie ayavera, = aera a yetereceS.ayese.etece ao. Sete ey ecn asa vete,a are eee Se et ece a 
6 PACKAGE BODY my_functions IS 

7 FUNCTION bed_to_ssd (SIGNAL input: INTEGER) RETURN BIT_VECTOR IS 
8 VARIABLE output: BIT_VECTOR(6 DOWNTO 0); 

9 BEGIN 

10 CASE input IS 

11 WHEN O=>output:="1111110"; --decimal 126 

12 WHEN 1=>output:="0110000"; --decimal 48 

13 WHEN 2=>output:="1101101"; --decimal 109 

14 WHEN 3=>output:="1111001"; --decimal 121 

15 WHEN 4=>output:="0110011"; --decimal 51 

16 WHEN 5=>output:="1011011"; --decimal 91 

17 WHEN 6=>output:="1011111"; --decimal 95 

18 WHEN 7=>output:="1110000"; --decimal 112 

19 WHEN 8=>output:="1111111"; --decimal 127 


20 WHEN 9=>output:="1111011"; --decimal 123 
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21 WHEN OTHERS=> output: ="1001111"; 

22 --letter "E" (Error) -> decimal 79 

23 END CASE; 

24 RETURN output; 

25 END bcd_to_ssd; 

26 END my_functions; 

Ah Ses SSe eS Sees Sse a ee ae ee eee ae 
I eae Maity; IC OC Cie o> = eS cneiie ie te ici ees teehia ce 
2 USE work.my_functions.all; 

3 Sietee Geese ahaa. ee Sue ae eee eee oe See eee Se eee SS See es ee oS es eee: 


4 ENTITY bcd_to_ssd_converter IS 

5 PORT (x: IN INTEGER RANGE O TO 9; 

6 y: OUT BIT_VECTOR(6 DOWNTO 0)); 

7 END bcd_to_ssd_converter; 

ARCHITECTURE decoder OF bcd_to_ssd_converter IS 

10 BEGIN 

11 y<=bcd_to_ssd(x); 

12 END decoder; 
a aa ll 


Ww © 


20.3 Generic Multiplexer 


Multiplexers were studied in Section 11.6. A top-level diagram for that type of circuit is shown in 
Figure 20.5(a) with an arbitrary number of inputs (M) as well as an arbitrary number of bits per input 
(N). Log,M (assuming that M is a power of 2) bits are needed in the input-select (se!) port. 

The purpose of this example is to show how multi-dimensional data arrays can be created and 
manipulated with VHDL. 

The main input (x) can be specified in several ways, with one option depicted in Figure 20.5(b). 
Because this is a 2D array, not available in the predefined VHDL data types, it must be created. Such can 
be done in the main code itself or in a separate PACKAGE. However, in this particular example, the new 
type will be needed right in the beginning of the code (that is, in the ENTITY, to specify the x input), so 
the latter alternative must be adopted. Such a PACKAGE, called my_data_types, is shown in the code below, 
containing the new data type (called matrix, line 3). 

In the main code, the new data type is employed in the ENTITY to specify x (line 7). The circuit is 
constructed using the GENERATE statement to instantiate N assignments from x to y (lines 14-16). Note 
that a USE clause was necessary (line 2) to make the PACKAGE visible to the main code. Note also that this 
code is completely generic, that is, by simply changing the values of M and N in lines 5-6 any multiplexer 
can be obtained. 
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x(0, N-1) ai x(0,0) | —> y(0) 


x(1, N-1) — x(1, 0) | —> y(1) 


x(2, N-1) bes x(2,0) | —> y(2) 


x(M-1,N-1) —...— x(M-1, 0) | —> y(M-1) 


(b) 


160.0 ns 320.0 ns 480.0 ns 640.0 ns 800.0 ns 


|). 2eeac= Package: *Se=sse2sessssaseecneeaesres ReeSeeers Sete Ress eReeen es 
2 PACKAGE my_data_types IS 

3 TYPE matrix IS ARRAY (NATURAL RANGE <>, NATURAL RANGE <>) OF BIT; 
4 END PACKAGE my_data_types; 

5 Stace cee cee ee Se ee ee See ee ee ee Slee eee ee ee ee ee ee Se ee ee es 
J sesese5 Main code: ---------r rrr rrr rrr rrr rrr rrr rrr rrr rrr eee 

2 USE work.my_data_types.all; 

3 Se Soe aS Sas oes eS eo ee See a ee See ee ee eee eee ee So er ee 

4 ENTITY generic_mux IS 

5 GENERIC (M: INTEGER :=4; --number of inputs 

6 N: INTEGER:=3); --number of bits per input 

7 PORT (x: IN matrix (0 TO M-1, N-1 DOWNTO 0); 

8 sel: IN INTEGER RANGE 0 TO M-1; 

9 y: OUT BIT_VECTOR (N-1 DOWNTO 0)); 

10 END generic_mux; 

lh SS SeSs So eese Saar Se SS Se Se Serr eee Se Ss ee Se ee ae eee = 

12 ARCHITECTURE arch OF generic_mux IS 

13 BEGIN 

14 gen: FOR i IN N-1 DOWNTO O GENERATE 

15 y(i)<=x(sel, 7); 

16 END GENERATE gen; 

17 END arch; 

IG; SH se Ss So Ste SS Sa TSS OSs See Se SSS Ses PSS a a eee SSS Tete Sarees 


Corresponding simulation results can be observed in Figure 20.6, where small values were employed 
for M (=4) and N (=3) to simplify the visualization of the simulation results. 
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20.4 Generic Priority Encoder 


Priority encoders are combinational circuits, which were studied in Section 11.8. The option shown in 
Figure 11.21(b) will be designed now using VHDL and for an arbitrary value for N. 

A code for this problem is shown below. Even though the circuit is combinational, a sequential code 
was again employed (recall that concurrent code is only recommended for combinational circuits, while 
sequential code can implement both types of circuits, that is, sequential and combinational). 

To construct sequential code, a PROCESS was needed (lines 10-22) in which the LOOP statement was 
used (lines 13-20), combined with the IF statement, to detect a '1' in the input vector. Note that as soon 
as a'l'is found, the EXIT statement (line 16) causes LOOP to end. A VARIABLE (called temp, line 11) was 
employed instead of using y directly (y is a SIGNAL) because a VARIABLE accepts multiple assignments 
and its value is updated immediately; only in line 21 is its value passed to y. 


2 ENTITY priority_encoder IS 

3 GENERIC (N: INTEGER:=7); --number of inputs 
4 PORT (x: IN BIT_VECTOR(N DOWNTO 1); 

5 y: OUT INTEGER RANGE O TO N); 

6 END priority_encoder; 


7 a ee ee ea FN eam ey a cer fa A rfc se tcc ga PN ah ea es, gy ee cee ea eo 
8 ARCHITECTURE priority_encoder OF priority_encoder IS 
9 BEGIN 

10 PROCESS (x) 

11 VARIABLE temp: INTEGER RANGE O TO N; 

12 BEGIN 

13 FOR i IN x'RANGE LOOP 

14 IF (x(i)='1') THEN 

15 temp :=1; 

16 EXIT; 

17 ELSE 

18 temp :=0; 

19 END IF; 

20 END LOOP; 

21 y<=temp; 

22 END PROCESS; 

23 END priority_encoder; 

ZA, SaaS SS ise Sas SaaS ss gin ee ae ae ee te ae el eis 


Simulation results are depicted in Figure 20.7 for N=7 (see Figure 11.21(b)). Note, however, that the 
code is generic (that is, the only change needed for any other priority-encoder size is in line 3). 


320,0 ns 400.0 ns 480.0 ns 560)! 


1111110 


FIGURE 20.7. Simulation results from the priority encoder of Figure 11.21(b) with N=7. 
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2™ N-bit words Address Stored word 
0000 (0) | 1111110 (126) 


fofofofo|s[o| 0001 (4) | 0110000 (48) 


: 0010 (2) | 1101101 (109) 


PET fofofo} 0011 (3 1111001 121 
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; 1011011 
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' ' 4110000 


1111111 127 
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FIGURE 20.8. (a) ROM architecture; (b) Lookup table for BCD-to-SSD conversion. 


20.5 Design of ROM Memory 


The construction of memories from a technology standpoint was discussed at length in Chapters 16 and 17. 
In this and in the next section we are interested in analyzing how VHDL can be used to implement ROM 
and RAM circuits using regular logic (not for instantiation of prefabricated memory blocks). The following 
fours cases will be examined: 


mg ROM memory (Section 20.5) 

m@ Synchronous RAM with separate data I/O buses (Section 20.6) 

m@ Synchronous RAM with single data I/O bus (Section 20.6) 

m@ Synchronous RAM with separate R/W address buses and separate data I/O buses (Section 20.6) 


ROM memory 


A ROM memory is normally implemented using regular logic cells in CPLDs (an exception is the 
MAX II CPLD series—Section 18.3), or lookup tables (LUTs) in FPGAs (and in some CPLDs, like MAXID). 
In both cases, the LUT model can be employed, as in Figure 20.8(a), with address as the only input and 
data (that is, the contents stored in the addressed location) as the only output. 

The use of this type of circuit is exemplified in Figure 20.8(b), which shows a conversion table from 
BCD (binary-coded decimal) to SSD (seven-segment display), which was seen in Example 11.4 and also 
in Section 20.2. The former is entered through the address bus, causing the latter to be retrieved through 
the data bus. This type of conversion is needed, for example, when the output of a decade counter must 
be displayed by SSDs (as in Example 11.4). 


VHDL code 

A VHDL code for this ROM is shown below. In line 8 a new data type (called memory) was 
specified, which allows the creation of a 1D x 1D array with a total of 10 x 7 bits. Next, a CONSTANT 
(called rom) was declared in line 9 as conforming with the new data type for which ten 
7-bit values were specified in lines 10-19. Finally, in the code proper (line 21), a memory-read 
operation occurs. 


2 ENTITY memoryl IS 
3 PORT (address: IN INTEGER RANGE 0 TO 9; 
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4 data: OUT BIT_VECTOR(6 DOWNTO 0)); 

5 END memoryl; 

6 a i i ea te a eth eee eee eee Specs Bre eee eyes eee ee pe ee ee eer eras Spaces 
7 ARCHITECTURE memoryl OF memoryl IS 

8 TYPE memory IS ARRAY (0 TO 9) OF BIT_VECTOR(6 DOWNTO 0); 
9 CONSTANT rom: memory :=( 

10 "1ITI1 10", 

11 "0110000", 

12 "1101101", 

13 "1111001", 

14 "0110011", 

15 "1011011", 

16 "1011111", 

17 "1110000", 

18 "1TT11111", 

19 "1111011"); 

20 BEGIN 

21 data<=rom(address); 

22 END memoryl; 

23 SSeS ese cesses eee eee ete ee eee Ee Se ee eee ee Sees 


Solution with FUNCTION 

A different approach is presented next. This time the BCD-to-SSD conversion table (ROM) was created 
using a FUNCTION (Section 19.14). The function (called bed_to_ssd) was constructed in a PACKAGE (as in 
Section 20.2), so it can be reused and shared by other designs. In the main code, just a function call 
(line 11) is needed to produce the desired circuit. 


I) 2sSeres Package: =2+=s+Sssescecessec esses se cess ees eS SSeS 
2 PACKAGE my_package IS 

3 FUNCTION bcd_to_ssd (SIGNAL bcd: INTEGER) RETURN BIT_VECTOR; 
4 END my_package; 

5 Sac8 See bSoseoennsaoe salience Seen Soe SSieahsias Saeco sae, a Siete canstane = eee aoe 
6 PACKAGE BODY my_package IS 

7 FUNCTION bced_to_ssd (SIGNAL bcd: INTEGER) RETURN BIT_VECTOR IS 
8 TYPE memory IS ARRAY (0 TO 9) OF BIT_VECTOR(6 DOWNTO 0); 
9 CONSTANT rom: memory :=( 

10 "LTTIT110"; 

11 "0110000", 

12 "1101101", 

13 "1111001", 

14 "0110011", 

15 "1011011", 

16 "LOLITII 

17 "1110000", 

18 "LITI1a11", 

19 "1111011"); 

20 BEGIN 

21 RETURN rom(bcd); 

22 END bcd_to_ssd; 


23 END my_package; 
a a a aaa ll 
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Lo see Ss Main GOd@s s¢25ss ress srs ecne Saas fess Sees tae Sees Sess ees Se 
USE work.my_package.all; 


PORT (bcd: IN INTEGER RANGE O TO 9; 
ssd: OUT BIT_VECTOR(6 DOWNTO 0)); 


2 

3 

4 ENTITY ssd_driver IS 
5 

6 

7 END ssd_driver; 


ARCHITECTURE ssd_driver OF ssd_driver IS 
10 BEGIN 
11 ssd<=bcd_to_ssd(bcd); 
12 END ssd_driver; 
US, Fe ape yeaa en sya a Se ir isle Sie ae eee aii ee a le Sie ee ets Slee 


20.6 Design of Synchronous RAM Memories 


As mentioned in Section 20.5, we are interested in describing how VHDL can be used to implement 
ROM and RAM circuits using regular logic (not for instantiation of prefabricated memory blocks), with 
the following four cases included: 


mg ROM memory (Section 20.5) 

m@ Synchronous RAM with separate data I/O buses (Section 20.6) 

m@ Synchronous RAM with single data I/O bus (Section 20.6) 

m@ Synchronous RAM with separate R/W address buses and separate data I/O buses (Section 20.6) 


Synchronous RAM with separate data I/O buses 


Figure 20.9 depicts a synchronous RAM memory with separate data I/O buses (data_in and data_out). 
For all memories, it is assumed that the number of bits in the address and data buses are M and N, 
respectively, so the memory contains 2” N-bit words. Observe that, contrary to the ROM seen above, the 
circuit in Figure 20.9 is synchronous (clocked, implemented with flip-flops). 

A VHDL code for this RAM is presented below. Note that none of the ports (lines 8-11) is bidirectional. 
As before, a new data type (called memory, line 15) was defined to allow the creation of a 1D x 1D array 
with a total of 2“-N bits. In line 16, a signal called ram was declared as belonging to that data type. In 
the code proper (ARCHITECTURE), a PROCESS (lines 18-25) was used to create the sequential part of the 
circuit, that is, the flip-flops that store data_in when a positive clock edge occurs while the input called 
write (write-enable) is asserted. Finally, in line 26, the output data bus was created. 


2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all; 

4 fs es a eo mya sme ey a Ns re ety en ey eo ee a Pe nh re ea es ee rece 
5 ENTITY memory2 IS 

6 GENERIC (N: INTEGER:=8; --Width of data bus 

7 M: INTEGER:=4); --Width of address bus 

8 PORT (clk, write: IN STD_LOGIC; 

9 address: IN INTEGER RANGE 0 TO 2**M-1; 
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10 data_in: IN STD_LOGIC_VECTOR(N-1 DOWNTO 0); 

11 data_out: OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0O)); 

12 END memory2; 

Gp tS ati Bee arti Se aie tee eis i tae ee ie ieee el ae a eee oe Slate eee ie a aie areas = 


14 ARCHITECTURE memory2 OF memory2 IS 
15 TYPE memory IS ARRAY (0 TO 2**M-1) OF STD_LOGIC_VECTOR(N-1 DOWNTO 0); 


16 SIGNAL ram: memory; 

17 BEGIN 

18 PROCESS (clk) 

19 BEGIN 

20 IF (clk’EVENT AND clk="'1") THEN 
21 IF (write='1') THEN 

22 ram(address) <=data_in; 
23 END IF; 

24 END IF; 

25 END PROCESS; 

26 data_out<=ram(address); 

27 END memory2; 

29 (Phe Se oerinie See cee tS Se ae ca ee Seem aie Saas Stas Seine ci 


Synchronous RAM with single data I/O bus 


A synchronous RAM with single data I/O bus is depicted in Figure 20.10. Its fundamental difference 
from the previous RAM is that now the data bus is bidirectional, so a tri-state buffer is needed to turn the 
output off when data must be written into the RAM. 


data_out 


address 


FIGURE 20.9. RAM memory with separate data I/O buses. 


address 


FIGURE 20.10. Synchronous RAM with a single (bidirectional) data I/O bus. 
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A VHDL code for this circuit is presented below. As before, two generic parameters (N and M, lines 
6-7) were employed to specify the widths of the data and address buses. Note also that now one of the 
ports is bidirectional (data, line 10). To construct the sequential part of the circuit (that is, the flip-flop 
bank) a PROCESS was again employed (lines 18-25), while the combinational part (tri-state buffer) was 
constructed with the concurrent statement WHEN (line 26). 


2 LIBRARY ieee; 
3 USE ieee.std_logic_1164.all1; 
4 


5 ENTITY memory3 IS 

6 GENERIC (N: INTEGER:=8; --Width of data bus 

7 M: INTEGER :=4); --Width of address bus 

8 PORT (clk, write: IN STD_LOGIC; 

9 address: IN INTEGER RANGE 0 TO 2**M-1; 

10 data: INOUT STD_LOGIC_VECTOR(N-1 DOWNTO 0)); 

11 END memory3; 

12 =S=s82eseesese oss Ssee SSeS Se Secs e eee Shots eS eSe 
13 ARCHITECTURE memory3 OF memory3 IS 

14 TYPE memory IS ARRAY (0 TO 2**M-1) OF 

15 STD_LOGIC_VECTOR(N-1 DOWNTO 0); 

16 SIGNAL ram: memory; 

17 BEGIN 

18 PROCESS (clk) 

19 BEGIN 

20 IF (clk’EVENT AND clk='1") THEN 

21 IF (write='1"') THEN 

22 ram(address) <=data; 

23 END IF; 

24 END IF; 

25 END PROCESS; 

26 data<=ram(address) WHEN write='0' ELSE (OTHERS=>'Z'); 
27 END memory3; 

26 Bee esess cise Soe oe eS SS SS ee oe eee = 


Synchronous RAM with separate R/W address buses and 
separate data I/O buses 


The last memory to be discussed in this section is depicted in Figure 20.11. It is a RAM with separate 
buses for data I/O, as well as separate read/write address buses, so read and write operations can be 
performed independently and are controlled by two separate clocks (clk1 for writing, clk2 for reading). 
The total number of flip-flops is now (2“+1)N instead of 2“: N because the word selected by rd_address 
is stored at the data_out output. 

A VHDL code for this RAM is presented next. Its overall structure is similar to that for the circuit of 
Figure 20.9, with the differences that now there are two address buses, two clocks, and an additional set 
of flip-flops. Because this circuit has only sequential parts, two processes were employed, one for the 
overall register bank (lines 20-27) and the other for the output register (lines 28-33). 


20.6 
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FIGURE 20.11. Synchronous RAM with separate R/W address and data I/O buses. 
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LIBRARY ieee; 
USE ieee.std_logic_1164.al1; 
ENTITY memory4 IS 
GENERIC (N: INTEGER:=8; --Width of data bus 
M: INTEGER:=4); --Width of address bus 
PORT (clk1, clk2, write: IN STD_LOGIC; 
rd_address: IN INTEGER RANGE 0 TO 2**M-1; 
wr_address: IN INTEGER RANGE 0 TO 2**M-1; 
data_in: IN STD_LOGIC_VECTOR(N-1 DOWNTO 0); 
data_out: OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0O)); 
END memory4; 
ARCHITECTURE memory4 OF memory4 IS 
TYPE memory IS ARRAY (0 TO 2**M-1) OF 
STD_LOGIC_VECTOR(N-1 DOWNTO 0); 


SIGNAL ram: memory; 
BEGIN 

PROCESS (clk1) 

BEGIN 


IF (clkl"EVENT AND clkl="1') THEN 


IF (write='1') THEN 
ram(wr_address) <=data_in; 
END IF; 
END IF; 


END PROCESS; 

PROCESS (clk2) 

BEGIN 
IF (clk2’EVENT AND clk2='1"') THEN 

data_out<=ram(rd_address); 

END IF; 

END PROCESS; 

END memory4; 
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20.7 Exercises 


In all exercises below, the VHDL code must be written, compiled, debugged, and then carefully 
simulated. 


1. 


Address decoder #1 


This exercise regards the address decoder designed in Section 20.1, for N=3. Init, the “simple” WHEN 
statement was employed (that is, WHEN/ELSE). Rewrite that code employing the “selected” WHEN 
(that is, WITH/SELECT/WHEN). 


. Address decoder #2 


This exercise again regards the address decoder designed in Section 20.1. Part (b) is generic, that 
is, allows any value of N (where N is the number of bits in the address bus). Suppose that, instead 
of N, the number of bit lines (that is, the size of the memory pile, M= QN ) is wanted as the arbitrary 
parameter. Modify the code with M in place of N in line 3. 


. Address decoder #3 


The address decoder designs of Section 20.1 employed concurrent VHDL code (which is recom- 
mended only for combinational circuits—and address decoders are combinational). However, 
sequential VHDL code allows the construction of sequential as well as combinational circuits. 
Rewrite the code for part (b) using only sequential VHDL statements (IF, CASE, LOOP, and WAIT, 
which must be located inside a PROCESS). 


. BCD-to-SSD function 


Repeat the BCD-to-SSD converter design of Section 20.2, this time with the FUNCTION (Section 19.14) 
located in the main code instead of in a PACKAGE. More specifically, locate in the declarative part of 
the ARCHITECTURE. 


. Multiplexer #1 


This question refers to the multiplexer designed in Section 20.3. Rewrite the code, making as many 
simplifications as possible, for the particular case of N=1 (but with M still generic). In this case, is it 
necessary to create a special data type? 


. Multiplexer #2 


Can you write a VHDL code for the multiplexer of Section 20.3, with both N and M still generic, 
using only pre-defined data types? (In other words, without using the type called matrix in that code 
or any other user-defined data type.) 


. Parity generator 


Parity detectors were studied in Section 11.7. In continuation to that, Figure E20.7 shows the dia- 
gram of a parity generator. The circuit has a 7-bit input, a, and an 8-bit output, b. It has also a single- 
bit parity-selection input, called parity. The circuit must detect the parity of a, then add an extra 
bit to it (on its left) to produce b, whose parity (number of '1's) must be odd if parity='0' or even if 
parity ='1'. Design this circuit using VHDL. 
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generator 


FIGURE E20.7. 


8. Priority encoder 


Write a VHDL code to implement the priority encoder of Figure 11.21(a) (see equations in Section 11.8). 
The number inputs (N) should be a generic parameter. 


9. Binary sorter #1 
Write a VHDL code to implement the binary sorter of Figure 11.23, for N=5. 
a. Do it using concurrent code. 
b. Repeat it with sequential code. 
10. Binary sorter #2 


Repeat the VHDL design for the binary sorter of Figure 11.23, this time with N (number of inputs) 
entered as a generic parameter. 
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Objective: Chapter 20 showed VHDL designs for combinational logic circuits. The second and final 
part of combinational circuits is presented in this chapter, which exhibits VHDL designs for combinational 
arithmetic circuits (studied in Chapter 12). 


Chapter Contents 
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21.5 ALU 

21.6 Exercises 


21.1  Carry-Ripple Adder 


In this section and in the next, the physical structures of adders/subtracters are explored, while in Section 21.3 
the types of addition/subtraction (that is, unsigned or signed) are considered. 

Adders are combinational arithmetic circuits, studied in Sections 12.2 and 12.3. Of all multibit adders, the 
carry-ripple adder (seen in Figure 12.3, partially repeated in Figure 21.1 below) is the simplest one (it has the 
least amount of hardware). Even though we have already illustrated this type of design in Example 19.7, 
it is included here for completeness and also to show its design without using the COMPONENT construct 
(showing such construct was the purpose of Example 19.7). 

To implement this structural design, it is important to recall this adder’s equations, given in Section 12.2, 
that is: 


s=a@®b@cin 

cout=a-b+a-cin+b-cin 

A corresponding VHDL code is shown below, with the expressions above appearing in lines 20-22. 
Note that N is a generic parameter (line 6). To create N instances of the assignments, the LOOP statement 
was used in lines 19-23. If the data types BIT and BIT_VECTOR were employed instead of STD_LOGIC 


and STD_LOGIC_VECTOR, then the first section of the code (library declarations, lines 2 and 3) could be 
suppressed. 
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FIGURE 21.1. Carry-ripple adder architecture. 

1 ay aha meme pw me, ree re ea een ee ey ee eee 

2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all; 

4 beatetctenece eb soc eso e ae clioe te Bee tice see ees eee See ee 

5 ENTITY carry_ripple_adder IS 

6 GENERIC (N : INTEGER := 8); --number of bits 

7 PORT (a, b: IN STD_LOGIC_VECTOR(N-1 DOWNTO 0); 

8 cin: IN STD_LOGIC; 

9 s: OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0); 

10 cout: OUT STD_LOGIC); 

11 END carry_ripple_adder; 

2 See eS as Sees = Se ea eS aa ee ae ea 

13 ARCHITECTURE structure OF carry_ripple_adder IS 

14 BEGIN 

15 PROCESS(a, b, cin) 

16 VARIABLE carry : STD_LOGIC_VECTOR (N DOWNTO 0); 

17 BEGIN 

18 carry(0) := cin; 

19 FOR i IN O TO N-1 LOOP 

20 s(i) <= a(i) XOR b(i) XOR carry(i); 

21 carry(i+1) := (a(i) AND b(i)) OR (aCi) AND 

22 carry(i)) OR (b(i) AND carry(i)); 

23 END LOOP; 

24 cout <= carry(N); 

25 END PROCESS; 

26 END structure; 

0 a aa a 


Simulation results are displayed in Figure 21.2. As expected, because a, b, and s are all 8-bit vectors, 
whenever a+ b> 255 occurs, the carry-out bit is asserted and 256 is subtracted from s. The glitch observed 
at the output is absolutely normal because in the real world the sum bits cannot all change exactly at the 
same time. 


21.2 Carry-Lookahead Adder 


Similar to Section 21.1, this section also deals with an adder’s physical structure. However, a higher per- 
formance circuit will now be designed (at the expense of silicon and power). 

Carry-lookahead adders were studied in Section 12.3 (see Figures 12.9 and 12.10), with Equations 12.10 
and 12.11 employed to compute the G (generate) and P (propagate) signals and Equations 12.17—12.20 
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FIGURE 21.2. Simulation results from the VHDL code for the carry-ripple adder of Figure 21.1. 
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FIGURE 21.3. Simulation results from the 32-bit adder designed in Section 21.2. 


used for computing the carry bits. To make the design even more realistic, we assume that a large adder 
is needed (32 bits). Therefore, as explained in Section 12.3, the design must be broken into several blocks 
because the usefulness of the carry-lookahead approach is normally limited to about four or so bits. 
Consequently, the adder will be broken into eight blocks of 4 bits each, with each block operating as a true 
carry-lookahead adder, and with the interblock connections performed in a carry-ripple fashion (as in 
Section 21.1). 

A corresponding VHDL code is shown below. The 4-bit carry-lookahead adder was constructed to be 
instantiated later in the main code as a COMPONENT. Recall that a COMPONENT is just a conventional piece 
of VHDL code (that is, library declarations + ENTITY +ARCHITECTURE), which can be seen in the first part 
(37 lines) of the code below, where the carry-lookahead equations mentioned above were employed. The 
data types STD_LOGIC and STD_LOGIC_VECTOR were chosen, but BIT and BIT_VECTOR would also do. 
The name chosen for this part of the design (determined by the ENTITY’ s name) is carry_lookahead_adder. 

In the main code, the 4-bit carry-lookahead adder is instantiated using the COMPONENT statement. 
Recall, however, that a component must always be declared before it is instantiated. This can be done in 
several places, including the architecture’s declarative part, which was the chosen option in this example 
(lines 15-20 of the main code). An internal signal, called carry, was defined in line 13 to deal with the 
carry-out signal from each 4-bit block. The global port cin (carry-in) was assigned to the carry-in input 
of the first block (line 23), while the carry-out bit from the last block was assigned to the global port 
cout (line 32). The 4-bit adder is instantiated in lines 25-30 (under the label adder). Note that, to avoid 
writing the whole component eight times, the GENERATE statement (labeled gen_adder) was employed 
(lines 24-31). Note also that positional mapping was employed in the PORT MAP section of the component 
instantiation. Finally, simulation results are depicted in Figure 21.3. 
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4-bit carry-lookahead adder: ---------------- 


jeee; 


USE ieee.std_logic_1164.al1; 


ENTITY carry_lookahead_adder IS 

(a, b: IN STD_LOGIC_VECTOR(3 DOWNTO 0); 

cin: IN STD_LOGIC; 

sum: OUT STD_LOGIC_VECTOR(3 DOWNTO 0); 

cout: OUT STD_LOGIC); 

END carry_lookahead_adder; 

ARCHITECTURE structure OF carry_lookahead_adder IS 
SIGNAL G, P, 


PORT 


BEGIN 


G <= 
P <= 


cout 


END stru 


: STD_LOGIC_VECTOR(3 DOWNTO 0); 


--- Computation of G and P: 


a AND b; 
a XOR b; 


--- Computation of carry: 


<= cin; 
<= G(0) 
(P(0) 
<= G(1) 
(P(1) 
(P(1) 
<= G(2) 
(P(2) 
(P(2) 
(P(2) 
<= G(3) 
(P(3) 
(P(3) 
(P(3) 
(P(3) 


AND cin); 


AND G(0)) OR 
AND P(0) AND cin); 


AND G(1)) OR 
AND P(1) AND G(0)) OR 
AND P(1) AND P(O) AND cin); 


AND G(2)) OR 

AND P(2) AND G(1)) OR 

AND P(2) AND P(1) AND G(0)) OR 

AND P(2) AND P(1) AND P(0) AND cin); 


--- Computation of sum: 
sum <= P XOR c; 


cture; 


LIBRARY 


Main code: 


jeee; 


USE ieee.std_logic_1164.al1; 


ENTITY big_adder IS 

(a, b: IN STD_LOGIC_VECTOR(31 DOWNTO 0); 
cin: IN STD_LOGIC; 

sum: OUT STD_LOGIC_VECTOR(31 DOWNTO 0); 
cout: OUT STD_LOGIC); 


PORT 


END big_ 


adder; 


ARCHITECTURE big_adder OF big_adder IS 
SIGNAL carry: STD_LOGIC_VECTOR(8 DOWNTO 0); 
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15 COMPONENT carry_lookahead_adder IS 

16 PORT (a, b: IN STD_LOGIC_VECTOR(3 DOWNTO 0); 
17 cin: IN STD_LOGIC; 

18 sum: OUT STD_LOGIC_VECTOR(3 DOWNTO 0); 
19 cout: OUT STD_LOGIC); 

20 END COMPONENT; 

2). Se eee sere Set eeee Sie eee orl eee Set ease ase ss 
22 BEGIN 

23 carry(0) <= cin; 

24 gen_adder: FOR i IN 1 TO 8 GENERATE 

25 adder: carry_lookahead_adder PORT MAP ( 

26 a(4*i-1 DOWNTO 4*i-4), 

27 b(4*i-1 DOWNTO 4*i-4), 

28 carry(i-1), 

29 sum(4*i-1 DOWNTO 4*i-4), 

30 carry(i)); 

31 END GENERATE; 

32 cout <= carry(8); 

33 END big_adder; 

34 Hea sneecr seer esr ee aee Ss soe cee Sere Ss See ee ees 


21.3 Signed and Unsigned Adders/Subtracters 


As seen in Sections 3.2 and 12.5, the construction of an adder is not affected by the fact that the system 
is signed or unsigned because in both systems the adder treats the numbers exactly in the same way. 
The only difference is the existence of additional two’s complement circuitry and the way the results are 
interpreted, which leads to distinct overflow-check criteria. 

As a conclusion, when using VHDL to design an adder/subtracter, the only aspect to worry about 
regards the data types because for some predefined types the addition (+) and subtraction (—) operators 
are also predefined in the standard libraries, while for others they are not (so data conversion functions 
are needed). This is precisely what the designs in this section illustrate. 


Adder/subtracter with INTEGER inputs/outputs 


The type INTEGER is defined in the package standard of the library std (see Section 19.3). This package 
also includes the ‘+’ and ‘~’ operators (among others), so the code is straightforward, as shown 
below. Note that all signals are specified as INTEGER (lines 4 and 5), so the sum and subtraction 
(lines 10 and 11) can be computed directly. As in the other examples, N (number of bits in each sig- 
nal) is specified using the GENERIC statement (line 3). Observe also that the number of bits at the 
output is the same as at the input (which is the usual form in computer-based systems), so overflow 
can occur. 


J #2esee< Adder/subtracter with INTEGER: -------------- 
2 ENTITY adder_subtracter IS 

3 GENERIC (N: INTEGER := 8); --number of input bits 
4 PORT (a, b: IN INTEGER RANGE O TO 2**N-1; 

5 sum, sub: OUT INTEGER RANGE O TO 2**N-1); 

6 END adder_subtracter; 


8 ARCHITECTURE adder_subtracter OF adder_subtracter IS 
9 BEGIN 
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10 sum <= a+b; 

11 sub <= a - b; 

12 END adder_subtracter; 

1 


Simulation results are depicted in Figure 21.4. As mentioned above, these are not affected by the fact 
that the system is signed or unsigned, only the interpretation of the results is. For example, suppose that 
the system is unsigned, so 125 (="0111 1101") and 150 (= "10010110") are both positive values, producing 
125 + 150 =275; due to overflow, this number is represented as (275-256) = 19 (an incorrect result). On the 
other hand, if the system is signed, then 125 (="0111 1101") is still positive, while 150 (="10010110") is 
now negative (it is the two’s complement representation of 150-256 =-106); therefore, 125 + (-106) = 19 
(a correct result). In summary, the result is the same (=19) in both cases, but it is incorrect in the former 
and correct in the latter. 


Adder/subtracter with STD_LOGIC_VECTOR inputs/outputs 


The code below implements the same adder/subtracter but now with all I/O signals specified as STD_ 
LOGIC_VECTOR (lines 9 and 10). The std_logic_1164 package (lines 2 and 3) is now needed because it is in 
that package that the type STD_LOGIC_VECTOR is defined. Observe also the inclusion of the std_logic_signed 
package in line 4 because it contains the functions ‘+’ and ‘—’ for STD_LOGIC_VECTOR. Finally, observe that 
line 4 can be replaced with line 5 with no effect on the implemented circuit. The simulation results of 
Figure 21.4 obviously also apply to this adder/subtracter. 


(Hees Adder/subtracter with STD_LOGIC_VECTOR: ---------- 
2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all1; 

4 USE jieee.std_logic_signed.all; 

5 --USE ieee.std_logic_unsigned.all; 

6 
7 


ENTITY adder_subtracter IS 


8 GENERIC (N: INTEGER := 8); --number of input bits 
9 PORT (a, b: IN STD_LOGIC_VECTOR(N-1 DOWNTO 0); 
10 sum, sub: OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0)); 
11 END adder_subtracter; 
ie} Ris aietatnis Sis iete im Sipe euareieis ain et ieaie ae Shia e nates Sarai see ae ee Sie) Sen eae eon 
13 ARCHITECTURE adder_subtracter OF adder_subtracter IS 
14 BEGIN 
15 sum <= a+ b; 
16 sub <= a - b; 
17 END adder_subtracter; 
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FIGURE 21.4. Simulation results from the adder/subtracter designed in Section 21.3. 
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21.4 Signed and Unsigned Multipliers/Dividers 


Multipliers and dividers are also combinational arithmetic circuits, so their theory was seen in Chapter 3 
(Sections 3.4—3.7), and their respective circuits were seen in Chapter 12 (Sections 12.9 and 12.10). Such 
circuits are represented symbolically in Figure 21.5, where a, b, and div (=a/b) are N-bit signals, while 
prod (=a*b) is 2N bits wide. 

Similar to the previous section, we want to illustrate the design of these arithmetic circuits for dif- 
ferent choices of input-output data types. However, as seen in Chapter 3, multipliers/dividers are 
designed differently when the system is signed versus when it is unsigned, so the following three cases 
will be examined: 


@ Unsigned multiplier/divider with I/O signals specified as INTEGER 
m@ Signed multiplier/divider with I/O signals specified as INTEGER 
m Signed multiplier/divider with I/O signals specified as STD_LOGIC_VECTOR 


Unsigned multiplier/divider with INTEGER inputs/outputs 


The corresponding VHDL code is shown below. Because the system is unsigned and the inputs and outputs 
are of type INTEGER, prod and div can be computed directly. Simulation results are depicted in Figure 21.6. 


--- Unsigned mult/div with INTEGER: -------------------- 
ENTITY multiplier_divider IS 
GENERIC (N: INTEGER := 8); --number of bits 
PORT (a, b: IN INTEGER RANGE 0 TO 2**N-1; 
prod: OUT INTEGER RANGE 0 TO 4**N-1; 
div: OUT INTEGER RANGE 0 TO 2**N-1; 


a a 
b b 


FIGURE 21.5. Multiplier and divider representations, where a, b, and div are N-bit signals, while prod 
is 2N bits wide. 
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FIGURE 21.6. Simulation results from the unsigned multiplier/divider of Section 21.4. 
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7 END multiplier_divider; 


8 Ssctetoenaccans seo cee cence seoee es ee cc eec eee cece ce cee ee ad 
9 ARCHITECTURE multiplier_divider OF multiplier_divider IS 
10 BEGIN 

11 prod <= a * b; 

12 div <=a / b; 

13 END multiplier_divider; 

1 ee a 


Signed multiplier/divider with INTEGER inputs/outputs 


The code is shown below. The inputs are first converted from INTEGER to SIGNED (lines 15 and 16) by 
a function called TO_SIGNED available in the numeric_std package (lines 2 and 3). The results are then 
converted back to INTEGER in lines 17 and 18 by a function called TO_INTEGER available in the same 
package. Simulation results are displayed in Figure 21.7 (for N=8, thus with a, b, and div ranging from 
-128 to 127 and prod from -32,768 to 32,767) with the results exhibited in two different ways. 


dl. -2eesee Signed mult/div with INTEGER: ------------------- 
2 LIBRARY ieee; 

3 USE ieee.numeric_std.all; 

4 


5 ENTITY multiplier_divider IS 

6 GENERIC (N: INTEGER := 8); --number of bits 

7 PORT (a, b: IN INTEGER RANGE 0 TO 2**N-1; 

8 prod: OUT INTEGER RANGE 0 TO 4**N-1; 

9 div: OUT INTEGER RANGE 0 TO 2**N-1); 

10 END multiplier_divider; 

Ld. #5 ee-F Sel See Sei Se eee ee Ss See Sees ete ess See Sea eee 
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FIGURE 21.7. Simulation results from the signed multiplier/divider of Section 12.4. In the upper graph, the 
results are displayed in unsigned form, thus requiring the user to “interpret” them. In the second graph, 
signed representation was chosen, so the actual results can be viewed directly. 
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12 ARCHITECTURE multiplier_divider OF multiplier_divider IS 


13 SIGNAL a_sig, b_sig: SIGNED(N-1 DOWNTO 0); 

14 BEGIN 

15 a_sig <= TO_SIGNED(a, N); 

16 b_sig <= TO_SIGNED(b, N); 

17 prod <= TO_INTEGER(a_sig * b_sig); 

18 div <= TO_INTEGER(a_sig / b_sig); 

19 END multiplier_divider; 

20: s86sSess sescestess eases es eete eee nee sos eee eee ee ee eae 


Signed multiplier/divider with STD_LOGIC_VECTOR inputs/outputs 


This solution is similar to that above. The inputs are first converted from STD_LOGIC_VECTOR to SIGNED 
(lines 16 and 17) by a function called TO_SIGNED available in the numeric_std package (line 4). The 
results are then converted to STD_LOGIC_VECTOR in lines 18 and 19. The simulation results shown in 
Figure 21.7 are obviously valid for this code also. 


1 --- Signed mult/div with STD_LOGIC_VECTOR: ------------- 
2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all; 

4 USE ieee.numeric_std.all; 


5 5 Sg aS cS a gs yey ay eS ene es 
6 ENTITY multiplier_divider IS 

7 GENERIC (N: INTEGER := 8); --number of bits 

8 PORT (a, b: IN STD_LOGIC_VECTOR(N-1 DOWNTO 0); 

9 prod: OUT STD_LOGIC_VECTOR(2*N-1 DOWNTO 0); 

10 div: OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0O)); 

11 END multiplier_divider; 

U2 Sa Se ei pen is i eS ee Pe ee ee ee 
13 ARCHITECTURE multiplier_divider OF multiplier_divider IS 
14 SIGNAL a_sig, b_sig: SIGNED(N-1 DOWNTO 0); 

15 BEGIN 

16 a_sig <= SIGNED(a); 

17 b_sig <= SIGNED(b); 

18 prod <= STD_LOGIC_VECTOR(a_sig * b_sig); 

19 div <= STD_LOGIC_VECTOR(a_sig / b_sig); 

20 END multiplier_divider; 
a a a 


21.5 ALU 


The ALU (arithmetic-logic unit) was studied in Section 12.8. Its symbol is shown on the left of Figure 21.8, 
while the specifications for the present example are shown on the right (borrowed from Figure 12.17). 
The design of this ALU, using VHDL, is presented below. 

Recall that ALUs are combinational circuits, so concurrent or sequential code can be used. The former 
was chosen in this example (see the WHEN statement in the code below, where the WITH- SELECT -WHEN 
version was adopted). If the latter were chosen, then CASE would be the most appropriate statement, and 
a PROCESS would then be needed, because sequential statements can only be written inside PROCESS, 
FUNCTION, or PROCEDURE. 
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opcode 
Transfer a 0000 
Complement a 0001 
cin Transfer b 0010 
Complement b 0011 
a AND 0100 
NAND 0101 
OR 
y NOR 
b Increment a y=att 
Increment b y=b+1 1001 
Add aandb y=atb 1010 
opcode Arithmetic Sub b froma y=a-b 1011 
Sub a from b y =-atb 1100 
Add negative y =-a-b 1101 
Add with 1 y = atb+1 
Add with carry y = at+b+cin 


FIGURE 21.8. ALU symbol and specifications. 


Note also in the code below that STD_LOGIC_VECTOR was specified as the main signals’ type (lines 7 
and 10). However, the original package for this data type, std_logic_1164 (lines 2 and 3), does not define 
arithmetic operations. For that reason, the package std_logic_unsigned was added to the code (line 4), 
which, as mentioned in Section 19.3, contains arithmetic operations for STD_LOGIC_VECTOR. One 
might consider the use of INTEGER instead because it supports arithmetic operations, but then the 
problem would be with the logical operations, which are not defined in the original package (standard) 
for INTEGER. 


1 
2 LIBRARY ieee; 

3. USE ieee.std_logic_1164.all1; 

4 USE ieee.std_logic_unsigned.all; 
5 


6 ENTITY alu IS 

7 PORT (a, b: IN STD_LOGIC_VECTOR(7 DOWNTO 0); 

8 cin: IN STD_LOGIC; 

9 opcode: IN STD_LOGIC_VECTOR(3 DOWNTO 0); 
10 y: OUT STD_LOGIC_VECTOR(7 DOWNTO 0)); 

11 END alu; 

[2 +86 s ss 2eece see e sees see ees eae Mae SSeS ere Ree eS 
13 ARCHITECTURE alu OF alu IS 

14 BEGIN 

15 WITH opcode SELECT 

W600 S55 logic partsess2-= 

17 y <= a WHEN "0000", 

18 NOT a WHEN “OOO1", 

19 b WHEN "0010", 

20 NOT b WHEN “OO11", 


21 a AND b WHEN "0100", 
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22 a NAND b WHEN "0101", 
23 a OR b WHEN "0110", 

24 a NOR b WHEN "0111", 
250 arithmetic part:------ 
26 at+l WHEN "1000", 

27 b+1 WHEN "1001", 

28 a+b WHEN "1010", 

29 a-b WHEN "1011", 

30 O-at+b WHEN "1100", 

31 O-a-b WHEN "1101", 

32 at+b+1 WHEN "1110", 

33 a+b+cin WHEN OTHERS; 
34 END alu; 

Oe SSS SS eae SS Se aH Ss Se eS a cae ete Eee ee 


Simulation results are shown in Figure 21.9. The values a="00010001" (=17) and b="01010001" (=81) 
were adopted for the inputs, and seven opcodes were tested for which the results below are expected. 
Note in Figure 21.9 that the correct results occur. 


opcode ="0100" — y=a AND b="00010001" (=17) 

opcode ="0110" > y=a OR b="01010001" (=81) 

opcode ="0111" — y=a NOR b="10101110" (=174, or —82 in signed notation) 
opcode ="1000" — y=a +1="00010010" (= 18) 

opcode ="1010" — y=a+b="01100010" (=98) 

opcode ="1111" > y=a+b+cin="01100011" (=99) 


opcode ="1101" — y=-a—b="10011110" (=-98, or 158 in unsigned notation) 


0 ps 160.0 ns 320,0 ns 480.0 ns 640.0 ns 800,0 ns 
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FIGURE 21.9. Simulation results from the ALU designed in Section 21.5. 
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21.6 Exercises 


In all exercises below, the VHDL code must be written, compiled, debugged, and then carefully 
simulated. 


1. Incrementer 


a. Write a VHDL code for the circuit of Figure 12.14(b), then simulate it to verify whether it is an 
incrementer. 


b. Rewrite the code, this time for a generic value of N (GENERATE or LOOP can be used). 
2. Decrementer 


a. Write a VHDL code for the circuit of Figure 12.14(c), then simulate it to verify whether it is a 
decrementer. 


b. Rewrite the code, this time for a generic value of N (GENERATE or LOOP can be used). 
3. Two’s complementer 


a. Write a VHDL code for the circuit of Figure 12.14(d), then simulate it to verify whether it is a 
two’s complementer. 


b. Rewrite the code, this time for a generic value of N (GENERATE or LOOP can be used). 
4. Unsigned comparator 


Comparators are combinational arithmetic circuits, also studied in Chapter 12 (Section 12.7). 
A diagram for that type of circuit is depicted in Figure E21.4, which contains two inputs (a, b) 
and three outputs (x,, X5, x3). It must produce x,='l' when a=), x,='1' when a=, and x,='1' when 
a = b. Write a VHDL code from which this circuit can be inferred, assuming that the system is 
unsigned. 


a(7:0) ent 
> Xo 
b(7:0) oe 


FIGURE E21.4. 


5. Signed comparator 
Repeat the exercise above, now assuming that the inputs are signed. 
6. Magnitude comparator 


Using VHDL, design a magnitude comparator that contains two N-bit inputs (a, b) and a single-bit 
output (x). Assume that the system is signed, so x='1' must occur when a=b or a=—b. Enter N using 
the GENERIC statement. 


7. Signed and unsigned adders/subtracters #1 


In the adder/subtracter design of Section 21.3, two solutions were presented, one for INTEGER I/Os and 
the other for STD_LOGIC_VECTOR I/Os. Present another solution, this time with STD_LOGIC_VECTOR 
inputs and INTEGER outputs. 
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10. 


11. 


Signed and unsigned adders/subtracters #2 


Reciprocally to the exercise above, present another solution for the adder/subtracter using INTEGER 
inputs and STD_LOGIC_VECTOR outputs. 


. Signed and unsigned multipliers/dividers #1 


In the multiplier/divider design of Section 21.4, three solutions were presented, one unsigned with 
INTEGER I/Os, another signed for INTEGER I/Os, and finally another signed for STD_LOGIC_VECTOR 
I/Os. Present a new solution, this time for a signed system with STD_LOGIC_VECTOR inputs and 
INTEGER outputs. 


Signed and unsigned multipliers/dividers #2 


Reciprocally to the exercise above, present another solution for the signed multiplier/ divider using 
INTEGER inputs and STD_LOGIC_VECTOR outputs. 


ALU 

The questions below regard the ALU designed in Section 21.5. 

a. What should be changed in the VHDL code if the opcode (line 9) were specified as INTEGER? 
b. What happens to that solution if the std_logic_unsigned package (line 4) is not included? 


c. Find another package that could be employed in place of std_logic_unsigned without affecting 
the synthesized circuit. (Suggestion: see examples in Section 21.3.) 
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VHDL Design of 
Sequential Circuits 


Objective: Sequential circuits were studied in Chapters 13 to 15, with regular circuits in Chapters 
13 and 14 and finite state machine (FSM)-based circuits in Chapter 15. The same division is made in 
the VHDL examples, with regular sequential designs presented in this chapter and FSM-based designs 
shown in the next. This chapter closes with a larger example in which the design of neural networks is 
illustrated. 


Chapter Contents 


22.1 Shift Register with Load 
22.2 Switch Debouncer 

22.3 Timer 

22.4 Fibonacci Series Generator 
22.5 Frequency Meters 

22.6 Neural Networks 

22.7 Exercises 


22.1 Shift Register with Load 


Figure 22.1 shows an M-stage N-bit shift register (SR) with load capability (this circuit was studied in 
Section 14.1). When load='1', vector x must be loaded into the SR at the next rising clock edge, while 
for load ='0' the circuit must operate as a regular SR. We illustrate in this section the design of such SR 
under the following two premises: (i) M and N generic and (ii) employing a structural design approach 
(with COMPONENT used to instantiate the multiplexers and flip-flop banks). 

A VHDL code for this circuit is shown below. Because M and N must be arbitrary values, a user- 
defined data type is needed for x because none of the predefined types (Figure 19.4) satisfy the present 
need. Because such a type is needed at the beginning of the main code (in the ENTITY—see line 8 of the 
main code), it was specified in a PACKAGE (called my_package), which is made visible to the design by 
means of line 2 of the main code. 

The multiplexer and flip-flop bank were designed separately because they are intended to be called 
using the COMPONENT construct in the main code (as seen in Section 19.13, a COMPONENT is simply a prede- 
signed piece of regular VHDL code). Note that these two codes (multiplexer and ff_bank) are also generic, 
so their generic parameters must be overwritten by the main code (see GENERIC MAP in lines 34 and 36 
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X1(N-1:0) X2(N-1:0) X3(N-1:0) Xm(N-1:0) 


FIGURE 22.1. M-stage N-bit shift register with load capability. 
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FIGURE 22.2. Simulation results from the VHDL code for the shift register of Figure 22.1. 


of the main code). The GENERATE statement was employed to create M instances of these units (lines 
33-38). Note that the assignments adopted in GENERIC MAP and PORT MAP are all positional (Section 19.13). 
Corresponding simulation results are depicted in Figure 22.2. 


1 
2 
3 
4 
5 
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10 
11 
12 
13 


sake Blac Kaig 6 >). a= Ss sienna Sia iat alae a Sai Ee ae sini ei ia ee 
PACKAGE my_package IS 

CONSTANT bits: POSITIVE := 8; 

TYPE x_input IS ARRAY (NATURAL RANGE <>) OF BIT_VECTOR(bits-1 DOWNTO 0); 
END my_package; 


esas Multiplexer (a component): -------------------- 
ENTITY multiplexer IS 
GENERIC (bits: POSITIVE); 
PORT (inpl, inp2: IN BIT_VECTOR(bits-1 DOWNTO 0); 
sel: IN BIT; 
outp: OUT BIT_VECTOR(bits-1 DOWNTO 0)); 
END multiplexer; 
ARCHITECTURE multiplexer OF multiplexer IS 
BEGIN 
outp <= inpl WHEN sel='0" ELSE inp2; 
END multiplexer; 


22.1 Shift Register with Load 


| sess ff_bank (a component): -------------------- 
2 ENTITY ff_bank IS 

3 GENERIC (bits: POSITIVE); 

4 PORT (d: IN BIT_VECTOR(bits-1 DOWNTO 0); 

5 clk: IN BIT; 

6 q: OUT BIT_VECTOR(bits-1 DOWNTO 0)); 

7 END ff_bank; 

8 BaSe Syeiesters Saas. Seats SS eee arash Se Ses SSS Siete See 
9 ARCHITECTURE ff_bank OF ff_bank IS 

10 BEGIN 

11 PROCESS (clk) 

12 BEGIN 

13 IF (clk'"EVENT AND clk='1') THEN 

14 q <= d; 

15 END IF; 

16 END PROCESS; 

17 END ff_bank; 

0S a a a 
ly -eassnes Manni Codes. =2sncacaaceskeS Rese eras SSeS arin ae Se Smee 
2 USE work.my_package.al1; 

3 SSa.acee-c aco G6 sao 5 2 eS Seo See Se eve ee Sy eye oe ey ci Se eye, ee Se 


4 ENTITY shift_register IS 

5 GENERIC (M: INTEGER := 4; --# of stages 
6 N: INTEGER := 8); --# of bits 
7 PORT (clk, load: IN BIT; 

8 x: IN x_input(1 TO M); 


9 d: IN BIT_VECTOR(N-1 DOWNTO 0); 

10 q: OUT BIT_VECTOR(N-1 DOWNTO 0)); 

11 END shift_register; 

a al a a a 
13 ARCHITECTURE structural OF shift_register IS 

14 SIGNAL templ: x_input(0 TO M); 

15 SIGNAL temp2: x_input(1 TO M); 

16: SRS seseascessseeereaasssseseas= Sees pease sess r sees 
17 COMPONENT multiplexer IS 

18 GENERIC (bits: POSITIVE); 

19 PORT (inpl, inp2: IN BIT_VECTOR(bits-1 DOWNTO 0); 

20 sel: IN BIT; 

21 outp: OUT BIT_VECTOR(bits-1 DOWNTO 0)); 

22 END COMPONENT; 

23- =SSPSeees ssn s oa ase=eSsese se SSeS erat secs se Rseeariaaaieees 
24 COMPONENT ff_bank IS 

25 GENERIC (bits: POSITIVE); 

26 PORT (d: IN BIT_VECTOR(bits-1 DOWNTO 0); 

27 clk: IN BIT; 

28 q: OUT BIT_VECTOR(bits-1 DOWNTO 0)); 

29 END COMPONENT; 

30. SSR eede seer eee eee re See Hes Pee See se ESTE Se 


31 BEGIN 
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32 temp1(0) <= d; 
33 g: FOR i IN 1 TO M GENERATE 


34 mux: multiplexer GENERIC MAP (N) 

35 PORT MAP (temp1(i-1), x(i), load, temp2(i)); 
36 ff: ff_bank GENERIC MAP (N) 

37 PORT MAP (temp2(i), clk, temp1(i)); 

38 END GENERATE g; 


39 q <= temp1(M); 
40 END structural; 
CO sg ea ee Stay St te ee aa 


22.2 Switch Debouncer 


A switch debouncer was described in Exercise 15.19, which is related to the diagram repeated in 
Figure 22.3(a). When we press or change the position of a mechanical switch, bounces are expected 
to occur before the switch finally settles in the desired position, so in actual designs this type 
of switch must be debounced. This can be done analogically (for example, with RC circuits and, 
optionally, Schmitt triggers) or digitally. In the latter case, a minimum number of clock cycles are 
counted to guarantee that the switch has been in the same position for at least a certain amount of 
time (for example, 10 milliseconds). 
In this design the following debouncing criteria are adopted: 


m@ Switch closed (y='0'): x must stay low for at least 10 ms without interruption. 
m@ Switch open (y='1'): x must stay high for at least 10ms without interruption. 


To make the design generic, the time window (ftwindow=10ms) and the clock frequency (fclk) are 
entered using the GENERIC statement. With these two parameters, the CONSTANT max =twindow *fclk 
is defined (in the declarative part of the ARCHITECTURE), which can be used as a reference to reset 
the counter. In the simulations, a low frequency (fclk=1kHz) will be used to ease the visualization of 
the results. 

This example will be divided into three parts: 


mg Apreliminary circuit to implement this function will be sketched without VHDL. 
m Then the number of flip-flops needed to construct it will be estimated. 


m Finally, a generic design, using VHDL, will be presented. 


Vop=3.3 V 


Debouncer —> y 


(a) 


FIGURE 22.3. Switch debouncer: (a) Top-level diagram; (b) A possible solution. 
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Preliminary circuit sketch 


A possible solution is depicted in Figure 22.3(b). It contains a counter, which is allowed to count when 
x #y and is cleared when x=y occurs. If the count reaches the specified maximum (max), it causes the 
output DFF to toggle, hence storing x. Subsequently, the counter is cleared because x=y then results, 
after which the counter is ready to start a new debouncing cycle. 


Estimate for the number of flip-flops 


The estimated number of DFFs is one to store y plus n for the counter, where 1 =| log,max |. For example, 
if felk=25 MHz and twindow=10ms, then the counter must count up to max=250k, so 18 flip-flops are 
needed (for the counter), hence totaling 19 DFFs. 


VHDL code 


AVHDLcode for the debouncer is presented below, with fclk and twindow entered as generic parameters 
(lines 6 and 7). The code proper (ARCHITECTURE) contains a counter (variable count) plus an assignment 
from x to y (line 24). Recall from rule 6 of Figure 19.9 that when a value is assigned to a signal (or vari- 
able) at the transition of another signal, registers are inferred. In our case, the registers for the counter 
are inferred in line 21 because values are assigned to count at the transition of clk (line 19). Likewise, 
the output DFF is inferred in line 24 because a value is assigned to y at the transition of another signal 
(again clk). 


2 LIBRARY ieee; 
3 USE ieee.std_logic_1164.all; 
4 


5 ENTITY debouncer IS 

6 GENERIC(fclk: INTEGER:=1; --clock freq in kHz 
7 twindow: INTEGER:=10); --time window in ms 
8 PORT (x: IN STD_LOGIC; 

9 clk: IN STD_LOGIC; 

10 y: BUFFER STD_LOGIC); 

11 END debouncer; 

2s SSS Seas SS See SS a eee ears Se Sea a eae ere = ee a 
13 ARCHITECTURE debouncer OF debouncer IS 

14 CONSTANT max: INTEGER := fclk * twindow; 

15 BEGIN 

16 PROCESS (clk) 

sl VARIABLE count: INTEGER RANGE O TO max; 

18 BEGIN 

19 IF (clk'"EVENT AND clk='1') THEN 

20 IF (y /= x) THEN 

21 count := count + 1; 

22 IF (count=max) THEN 

23 count := 0; 

24 y <=xX3 

25 END IF; 

26 ELSE 

27 count := 0; 

28 END IF; 


29 END IF; 
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30 END PROCESS; 
31 END debouncer; 
2 SS ee Se Se ee ee Se ee ee ee ee ee ees ee ee es 


Simulation results are depicted in Figure 22.4. Because fclk = 1 kHz and twindow =10ms were employed 
(so max=10), the switch must stay in the same position for at least 10 positive clock edges for it to be 
considered valid. 

The reader is invited to compile this code and check whether the actual number of flip-flops inferred 
by the compiler matches the prediction made above. 


22.3 Timer 


A two-digit timer is depicted in Figure 22.5 (a similar circuit was studied in Example 14.11). The system 
is composed of three sections: counter, display drivers, and the display proper (with two SSDs—seven- 
segment displays—detailed in Example 11.4). 

The counter is the sequential part of the system. It must count seconds from 00 to 60, starting whenever 
the enable (ena) input is asserted, and stopping whenever 60 is reached or the enable switch is turned 
OFF. It must also have an asynchronous reset switch that zeros the system. If 60 is reached, besides stop- 
ping, the full_count output must be asserted. 

The SSD driver (BCD-to-SSD converter) is the combinational part of the system (seen in Section 20.2). It 
must convert the 4-bit outputs from the counters (count1, count2) into 7-bit signals (dig1, dig2) to feed the 
two-digit display (assume that these are common-cathode SSDs—see Figure 11.13). 

This example will be divided into two parts: 


m The number of flip-flops needed to implement the timer will be estimated. 


m Then a generic design, using VHDL, will be developed. 


Estimate for the number of flip-flops 


The number of flip-flops is| log,fclk | +4+3 (the first term is to reduce the frequency to 1Hz, the second 
is for the 0-to-9 counter of dig1, and the third for the 0-to-6 counter of dig2). For example, for fclk= 10 Hz, 
1kHz, or 1 MHz, the expected number of DFFs is 11, 17, or 27, respectively. 


VHDL code 


A VHDL code for this problem is shown below, where fclk was entered using the GENERIC statement 
(line 6), with a small default value used to ease the visualization of the simulation results. 


Ops 80.0ns 1600ns 2400ns 3200ns 4000ns 4800ns 5600ns 6400ns 7200ns 8000ns 880.0ns 960.0n 
Name O ps : ; 


FIGURE 22.4. Simulation results from the VHDL code for the debouncer of Figure 22.3. 
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FIGURE 22.5. Two-digit timer. 


The sequential part (counters) is in lines 19-39 and was designed using IF, while the combinational 


1 
2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all1; 
4 


5 ENTITY timer IS 

6 GENERIC (fclk: INTEGER := 2); --clock frequency 

7 PORT (clk, rst, ena: IN STD_LOGIC; 

8 full_count: OUT STD_LOGIC; 

9 digl, dig2: OUT STD_LOGIC_VECTOR (6 DOWNTO 0O)); 

10 END timer; 

Nils taye's peiataceiatats siete hie fe ep eteinis' Seis aia, ea tiie = a SS es Sensei ee 
12 ARCHITECTURE timer OF timer IS 

13 BEGIN 

14 PROCESS(clk, rst, ena) 

15 VARIABLE count0O: INTEGER RANGE O TO fclk; --for 1Hz 
16 VARIABLE count1: INTEGER RANGE O TO 10; --for digl 
17 VARIABLE count2: INTEGER RANGE 0 TO 7; --for dig2 
18 BEGIN 

EQS 0 SesBeeee= COUNTEEINS Ss eeeseeereiese Sere eres ice eee are ens aie eere ees 
20 IF (rst="1') THEN 

21 countO := 0; 

22 countl := 0; 

23 count2 := 0; 

24 full_count <= '0'; 

25 ELSIF (count1=0 AND count2=6) THEN 

26 full_count <= '1'; 

27 ELSIF (clk"EVENT AND clk='1") THEN 

28 IF (ena='"1") THEN 

29 countO := countO + 1; 

30 IF (count0O=fclk) THEN 

31 countO := 0; 

32 countl := countl + 1; 

33 IF (countl=10) THEN 

34 countl := 0; 

35 count2 := count2 + 1; 


36 END IF; 


part (SSD driver) is in lines 41-63 and was designed with CASE. As can be seen in lines 15-17, three vari- 
ables were employed to implement the counters; the first normalizes the frequency to 1 Hz, the second 
feeds dig1, and the third feeds dig2. 
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END IF; 
END IF; 

END IF; 

Siar ee ong BCD to SSD 

CASE countl IS 
WHEN 0 => digl 
WHEN 1 => digl 
WHEN 2 => digl 
WHEN 3 => digl 
WHEN 4 => digl 
WHEN 5 => digl 
WHEN 6 => digl 
WHEN 7 => digl 
WHEN 8 => digl 
WHEN 9 => digl 
WHEN OTHERS => 

END CASE; 

CASE count2 IS 
WHEN 0 => dig2 
WHEN 1 => dig2 
WHEN 2 => dig2 
WHEN 3 => dig2 
WHEN 4 => dig2 
WHEN 5 => dig2 
WHEN 6 => dig2 
WHEN OTHERS => 

END CASE; 


END PROCESS; 


65 END timer; 


66 


conversion: - 


= “AITO; 
= "0110000"; 
= "1107101" 
<= "1111001"; 
<= "0110011"; 
= "1011011"; 
= “1007111 
= "1110000"; 
= ITs 
= TT TOE 
NULL; 


<= "1111110"; 
<= "0110000"; 
= "LL01T101L" § 
= "AT TOOL s 
= "0110011"; 
= "1011011"; 
= "LOTT 
NULL; 


--126 
--48 
--109 
221.21 
+251 
+291 
+=95 
==]12 
==127 
==123 


--126 
--48 
--109 
--121 
--51 
--91 
--95 


Simulation results (just a small fraction) are depicted in Figure 22.6. The reader is invited to compile 
this code and check whether the actual number of flip-flops inferred by the compiler matches the predic- 
tion made above. 


D ps 320.0 ns 


640.0 ns 960.0 ns 


1.28 us 1.92 us 2.24 us 2.56 us 


0 ps 


dig! 
i dig2 


full_count 
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FIGURE 22.6. Simulation results from the VHDL code for the timer of Figure 22.5. 
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22.4 Fibonacci Series Generator 


In the 12th century, Leonardo Fibonacci discovered a numeric series with interesting mathematical prop- 
erties, like the quick convergence of the ratio of any two consecutive elements to phi (6=(1+ V5) /2= 
1.61803398 ...), a number that became known as the golden ratio or the divine proportion (allegedly used, for 
example, in da Vinci’s Mona Lisa and in the dimensions of Egyptian pyramids). 

The Fibonacci series, F(n), starts with F(0)=0 and F(1)=1, with each new value obtained by summing 
the two preceding elements, thus resulting in F(1)={0, 1, 1, 2, 3,5, 8, 13, 21, 34, 55, 89, 144, 233, ...} (in closed 
form, F(n)=["-(1-)"]/\5). We want to design a circuit that generates this series with a new value dis- 
played at every positive clock transition. Though this could be done with a simple memory (lookup table), 
which would store the series, to reduce the amount of hardware an actual generator will be implemented. 

The design will be divided into three parts: 


mg Apreliminary circuit to implement this function will be sketched without VHDL. 


m Then the number of flip-flops needed to construct it, assuming that N=16 bits are used to represent 
the series, will be estimated. 


m And finally, a generic design, using VHDL, will be developed. 


Preliminary circuit sketch 


A possible solution for this problem is presented in Figure 22.7. The circuit contains two N-bit registers 
(A, B) plus an N-bit adder. Note that the reset signal is connected to rst of register B (so its initial state 
is c="00...000"=decimal 0), but is connected to rst/pre of register A (it must be connected to the preset 
input of only the LSB flip-flop, such that its initial state then is b="000...001"=decimal 1). Therefore, 
after initialization, the situation is c=0, b=1, and a=b+c=1. At the next (positive) clock edge, c=1, b=1, 
and a=2 result. Next, c=1, b=2, and c=3, and so on. 


Estimate for the number of flip-flops 


The number of flip-flops is simply 2N. Note that with N=16, the largest Fibonacci element is 46,368, 
which occurs for n=24. 


VHDL code 


A VHDL code based on the circuit of Figure 22.7 is shown below. Because STD_LOGIC was not 
employed, extra library declarations are not needed. N was entered using GENERIC (line 3). Reset is 


fibo_series 


Register Register 
A 


rsvpre 
clk 
rst 


FIGURE 22.7. Fibonacci series generator. 
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FIGURE 22.8. Simulation results from the VHDL code for the Fibonacci generator of Figure 22.7. 


asynchronous and causes b=1 and c=0 (lines 13-15). Because values are assigned to signals (b and c, 
lines 17 and 18) at the transition of another signal (clk, line 16), flip-flops are expected to be inferred (rule 
6 of Figure 19.9). 


1 
2 ENTITY fibonacci IS 

3 GENERIC (N: INTEGER := 16); --number of bits 

4 PORT (clk, rst: IN BIT; 

5 fibo_series: OUT INTEGER RANGE O TO 2**N-1); 
6 END fibonacci; 

7 Riles (aN) eee Sears Toate ae) Sree Ser are ae. Sear SUSraa ae eres arate 

8 ARCHITECTURE fibonacci OF fibonacci IS 

9 SIGNAL a, b, c: INTEGER RANGE 0 TO 2**N-1; 


10 BEGIN 

11 PROCESS (clk, rst) 

12 BEGIN 

13 IF (rst='1') THEN 

14 b <= 1; 

15 c <= 0; 

16 ELSIF (clk"EVENT AND clk='1") THEN 
17 c <=); 

18 b <= a; 

19 END IF; 

20 a<=b+cC; 

21 END PROCESS; 

22 fibo_series <= c; 

23 END fibonacci; 

DAL Be RES SSS Se SESS Ee He SPSS se SS SSS se HSS SS Se HSS ees 


Simulation results are depicted in Figure 22.8. The reader is invited to compile this code and 
check whether the actual number of flip-flops inferred by the compiler matches the prediction made 
above. 


22.5 Frequency Meters 


The design of frequency meters was introduced in Exercises 14.45 and 14.46, where two approaches were 
described. We want to proceed, now using VHDL. In what follows, clk is the system clock, x is the signal 
whose frequency we want to measure, and fclk and fx are their respective frequencies. 
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Two measurement approaches are considered (note that one is the reciprocal of the other): 


i. From the clock, a time window is created (with the duration of 1s, for example), and the number of 
pulses in x within it is counted. 


ii. The period of x (or several periods) is used as the time window, and the number of clock pulses 
within it is counted. 


This exercise will be divided into two parts: 


@ Apreliminary circuit for each approach will be sketched without VHDL, and related comments will 
be presented. 


m Then a VHDL code for approach (i) will be developed. 


(For the circuit of approach (ii), see Exercise 22.4.) 


Preliminary circuit sketch 


A general architecture for approach (i) is shown in Figure 22.9(a). counter1 divides the clock frequency 
by n+1, creating a waveform that stays low during nT, seconds and high during T, seconds. Choosing 
n=fclk,a1s time window (twindow) is obtained. This waveform causes the output of counter 2 to be stored 
in the register at its rising edge, and it also resets counter 2, which is released to start counting again after 
one clock period. The inactivity factor is therefore 1/(fclk+1)~0. This approach is accurate for large fx. 
A general architecture for approach (ii) is shown in Figure 22.9(b). The circuit operates in the opposite 
way relative to that above, that is, it counts clock pulses instead of pulses in x, with the latter playing the 
role of time window and used also for resetting the counter. Note that the frequency of x is divided by 2, 
giving rise to y; this operation prevents incorrect measurements when the duty cycle of x is not 50%. The 
inactivity factor of this circuit is poor (0.5); one way of increasing it is with the use of two counters (more 
hardware), one active while y='0', the other when y='1'. Contrary to Figure 22.9(a), in Figure 22.9(b) 
fx is not obtained directly; a divider is needed, which computes fx=fclk/m. This approach is therefore 


x —... 2 2 fy 
register 
> ak JUUUUUUUUU$ 
counter1 twindow I ee) | 
es +(n+1) - me 
nTo=1s To 
(a) 
clk counter x ma Wd CY ie ND 
register y reset count 
fx i E 
S ak SUUUUYUUUUL 
Xx B ¥ 


(b) 


FIGURE 22.9. General frequency-meter architectures. In (a), pulses of x within a clock-based time window are 
counted, while in (b), clock pulses within one period of x are counted. 
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1.08 us 1.4 us 1.72 us 2.04 us 2.36 us 


FIGURE 22.10. Simulation results from the VHDL code for the frequency meter of Figure 22.9(a). 


accurate for low frequencies. As pointed out in Exercise 14.46, fx should be limited to fx =e.fclk, where 
e is the maximum error acceptable in the measurement (for example, if felk=25 MHz and e=0.1%, then 
fx must be limited to 25kHz). Because of the divider, the hardware in this approach is more complex 
than that in approach (i). However, if the measurement of T, were wanted instead of fx, then the result 
would be available directly. 

Finally, to obtain an approach that is fine for high and for low frequencies, programmability must be 
incorporated into the system. For example, if approach (i) is chosen and fx is small, then the time win- 
dow should be widened, with the widening factor obviously taken into consideration to always display 
a normalized value. Moreover, the refresh rate of the display must also be considered; for high frequen- 
cies, the time window can be shortened so the display can be refreshed more frequently (for example, 
every few milliseconds); however, for very low frequencies (sub-Hz), the refresh rate has to be either low 
(every few seconds) or based on averaged (accumulated) measurements. 


VHDL code 


A VHDL code for the circuit of Figure 22.9(a) is presented below. Note the presence of two generic 
parameters in lines 3 (fclk=clock frequency) and 4 (fxmax=maximum frequency to be measured). An 
additional signal (test, line 6) was included in the ENTITY to visualize the signal twindow. 

Because all three blocks of Figure 22.9(a) contain flip-flops, and in all three a different signal is con- 
nected to the clock input, three separate processes were employed. The first process (lines 15-27) gener- 
ates the time window; the second (lines 29-38) counts the pulses of x; and the third (lines 40-45) infers 
the output register. Simulation results are depicted in Figure 22.10. 


2 ENTITY freq_meter IS 

3 GENERIC (fclk: INTEGER := 5; --clock freq. 
4 fxmax: INTEGER := 15); --max fx 

5 PORT (clk, x: IN BIT; 

6 test: OUT BIT; 

7 fx: OUT INTEGER RANGE O TO fxmax); 
8 END freq_meter; 


10 ARCHITECTURE behavioral OF freq_meter IS 


11 SIGNAL twindow: BIT; 

12 SIGNAL temp: INTEGER RANGE 0 TO fxmax; 

13 BEGIN 

WA SESS Ee TiMe: WANdOWE: “255 2Sss5855 S555 s aces hese 


15 PROCESS (clk) 
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VARIABLE count: INTEGER RANGE 0 TO fclk; 
BEGIN 
IF (clk'"EVENT AND clk='1') THEN 
count := count + 1; 
IF (count=fclk) THEN 
twindow <= '1'; 
ELSIF (count=fclk+1) THEN 
twindow <= '0'; 
count := 0; 
END IF; 
END IF; 
END PROCESS; 
Hae ie sis Counter fOr x see eeee ee dene se cimeees 
PROCESS (x, twindow) 
VARIABLE count: INTEGER RANGE 0 TO 20; 
BEGIN 
IF (twindow='1') THEN 
count := 0; 
ELSIF (x'EVENT AND x='1') THEN 
count := count + 1; 
END IF; 
temp <= count; 
END PROCESS; 
Sete eats ROGISECR: SSSnSerSeseee esses aes sais 
PROCESS (twindow) 
BEGIN 
IF (twindow'EVENT AND twindow='1') THEN 
fx <= temp; 
END IF; 
END PROCESS; 
test <= twindow; 


47 END behavioral; 


48 


22.6 Neural Networks 


We conclude this chapter by presenting a much larger VHDL design, which consists of a neural network. 
To save hardware, only integers (instead of floating-point numbers) are employed. 

Neural networks (NNs) [Haykin94] are highly parallel, highly interconnected systems. Such charac- 
teristics make their implementation very challenging, and also very costly, due to the large amount of 
hardware required. 

A feedforward NN is depicted in Figure 22.11. In this case, the circuit has two layers, each with three 
neurons, and each neuron with three synaptic weights. Only the parameters relative to the first layer are 
explicitly marked in the figure, where x; and y; are the inputs and outputs, respectively, w,; denotes the 
weights, and t; the threshold levels. 

The function computed by a neuron is given by: 


yj=f(2xwy-t) 
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(b) 


FIGURE 22.11. (a) Two-layer feedforward neural network; (b) Sigmoid computed by neurons. 


where f() is a nonlinear monotonically increasing function, preferably of sigmoidal type, that is: 


— Levi 
1+e7i 


Yj 


y 


where o;=%,x,w;—t;. This function is plotted in Figure 22.11(b). 


NN implementation 


Due to the huge amount of hardware needed to construct an NN, several aspects must be considered. In 
the discussion that follows we assume that N bits are employed to represent the inputs (x), the weights (w), 
and the outputs (y). The first consideration regards the number of inputs. To save pins, one might opt for 
using only one input (that is, N pins) through which all inputs are entered serially (one at a time) in accor- 
dance with the clock. Likewise, one might opt for only one multiplier per neuron instead of M (where M is 
the number of weights per neuron). Another important consideration regards the storage of the weights; if 
they do not need to undergo in-system changes, then they can be stored on-chip by a ROM-like structure 
(created using CONSTANT). Otherwise, if the weights require in-system programmability, a more flexible 
solution (with external control) must be implemented, like the use of a shift register to enter and store the 
weights or the construction of an on-chip RAM-like memory. 

Two (among several other) alternatives are illustrated in Figure 22.12. In (a), the weights are entered 
through a shift register that is controlled by a multiplexer. While load_w='1', the weights are entered 
and stored, after which load_w='0' causes the shift register to operate in a circular fashion. In (b), the 
weights are stored in a ROM-like memory, so they can only be changed by recompiling the code. In this 
circuit all nodes are explicitly named, and its operation is as follows. Initially, the register that stores the 
accumulated value (acc2) is reset. Next, the first input value is presented and gets multiplied by the first 
weight, producing prod=x-w. Note that because x and w are N bits wide, the size of prod must be 2N bits 
(the same occurs with acc1 and acc2). 

The next circuit section in both cases of Figure 22.12 is an accumulator, which computes accl = prod +acc2, 
stored at the next positive clock edge and thus replacing the previously accumulated value. The next sec- 
tion is where the sigmoidal function is computed; it takes acc2 (2N bits) as input and produces sig (with 
N bits) at the output. The latter is stored when rst='1' occurs, producing a clean and stable value for y. 
Notice that this circuit requires that after each series of input (x) values is concluded a reset pulse be applied. 

Besides the challenges already mentioned, another major challenge to implement an NN is with 
regard to the hardware needed to compute the sigmoid. To simplify the design, this function is some- 
times replaced with a simpler function, like a threshold function or a linear function with saturation. 
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oS 


clk rst rst 


< 


(b) 


FIGURE 22.12. Two options for a neuron implementation: (a) With the inputs (x) entered one at a time and 
the weights entered serially and stored by a shift register; (b) Again with the inputs entered one at a time but 
with the weights stored in a ROM-like memory. 


These, however, are computationally inferior (from a learning perspective) to the sigmoid, so the 
latter is included in the present example. 

In this design, N=6 was adopted, signifying that x, w, and y can range from —32 to 31, thus resulting 
in 2N=12 bits for the internal signals (prod, acc1, acc2), that is, a range between —2048 and 2047. For the 
sigmoid, eight points per quadrant were adopted (that is, eight points for positive values of o plus 8 for 
negative values), plus zero, hence totaling 17 points. For simplicity, t=0 was used, so o=>X(x-w). For 
o =0, the corresponding output is y=0, while for the other values of o the output was quantized using 
eight equal intervals per quadrant. Dividing the maximum value of y (=1) by 8, the following eight 
sectors result: 0<y<0.125, 0.125 =y<0.25, ..., 0.875 =y<1. For o, the maximum value was taken when 
y=0.99 occurs, that is, o=5.293. Applying this value in the expression of the sigmoid results in 0=0.251 
for y=0.125, o=0.511 for y=0.250, etc., which are listed in the table on the left of Figure 22.13, with y 
taken in the center of the corresponding quantization interval. The normalization was done as follows. 
For the input, because it contains 12 bits, the maximum value of o (=5.293) was encoded as 2047, while 
for the output, having 6 bits, the maximum value of y (=1) was encoded as 31. After rounding, the table 
shown on the right of Figure 22.13 resulted. 


VHDL code 


A complete VHDL code is shown below, which implements a one-layer NN with the general architec- 
ture of Figure 22.12(b). The code can be easily adapted to any number of neurons, with any number of 
synapses, and any number of bits to represent them. For simulation purposes, N=6 bits, 3 neurons, and 
5 synaptic weights per neuron were employed. 

As can be seen, the code is divided into three parts. The first part isa PACKAGE where circuit parameters, 
data types, and the list of weights are specified. The second part is another PACKAGE that computes the 
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Before normalization After normalization 
o o y 
ft) ft) ) 
0<0<0.251 0<0<97 2 
0.251<0<0.511 97<0<198 6 
0.511<6<0.788 198<0<305 10 
0.788<0<1.099 305<0<425 14 
1.099<0<1.466 425<0<567 18 
1.466<0<1.946 567<0<753 22 
1.946<0<2.708 753<0<1047 26 
022.708 o>1047 30 


FIGURE 22.13. Sigmoidal function implementation with signed 12-bit input and signed 6-bit output, for 
eight points per quadrant plus zero, using only integers. 


sigmoid (these two packages can obviously be put together but were written separately to make the 
overall code simpler to understand). Finally, in the third part, the main code is constructed. 

In the first PACKAGE, circuit parameters are specified (lines 7—9), then three special data types are 
defined (lines 11-14), whose purpose is to allow the specification of the internal signals for an array of 
neurons and also for the weights. Note that the type defined in lines 13 and 14 allows the weights to be 
entered as simple signed integers, which is as simple as it can get. This occurs in lines 17—19 with one set 
of weights (that is, for one neuron) in each line. 

The second package contains a FUNCTION (Section 19.14) that converts a 12-bit signal into a 6-bit 
signal using a sigmoidal conversion, constructed according to the approach described earlier. Note that 
the values used in the comparators are those listed in Figure 22.13, and that the returned value has the 
same sign as the input value. 

Finally, in the third part, the main code is presented, which is a straight implementation for the circuit 
shown in Figure 22.12(b) (it is an array of circuits). To make the code simpler to understand, a separate 
process was employed for each circuit section. The first process (lines 20-29) implements a counter, 
which is needed to control the inputs and to act as a pointer to the stored weights. The second process 
(lines 31-45) implements the registers for the signals accl — acc2 and sigmoid — y. The last process (lines 
47-59) constructs the combinational units of the circuit. Note that overflow check has been included in 
lines 52-56, which causes the accumulated value to assume the largest possible value if overflow occurs 
(observe the indexing employed in lines 52-55 and the importance of understanding how to deal with 
data types of dimensions 1D, 1D x 1D, and 2D [Pedroni04a]). 

Simulation results are displayed in Figure 22.14. Note that after five values of x are presented (each 
neuron has five synapses, so five inputs) a reset pulse must occur. Examine the accumulated (acc2) and 
output (y) values obtained in the simulation and compare them against the expected values (for the 
weights, use the values entered in lines 17-19 of the first package) and verify that they coincide. 

After studying this design carefully, other architectures can be implemented, including the addition 
of more layers. 


fle, Sieeetr Package of data types and weight values: ------------- 
2 LIBRARY ieee; 

3 USE ieee.numeric_std.all; 

4 SSG OSes ee same se Sees ke ce See eee aves Gee See See eevee eee = 
5 PACKAGE package_of_types_and_weights IS 


Bo SSE SS Part I: Circuit parameters: sor+esesSecsee= ser seee 
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FIGURE 22.14. Simulation results from the VHDL code for the neural network shown above (one layer, three 
neurons, five synapses per neuron, signed 6-bit representation, single multiplexed input, multiple outputs (for 
cascading)). 


7 CONSTANT neurons: INTEGER := 3; --# of neurons 

8 CONSTANT weights: INTEGER := 5; --d# of weights per neuron 

9 CONSTANT N: INTEGER := 6; --# of bits per weight 

TQ: seers Part. 2: Data types: sc-esssseceese seceecetsseengeissesee asses 
11 TYPE short_array IS ARRAY (1 TO neurons) OF SIGNED(N-1 DOWNTO 0); 
12 TYPE long_array IS ARRAY (1 TO neurons) OF SIGNED(2*N-1 DOWNTO 0); 
13 TYPE weight_array IS ARRAY (1 TO neurons, 1 TO weights) OF 

14 INTEGER RANGE -(2**(N-1)) TO 2**(N-1)-1; 

15 =e s6 Part 3: Weight values (signed integers): -------------------- 
16 CONSTANT weight: weight_array := ( 

17 (1, 4; 5, 5, =5), --neuron 1 

18 C5. 205; (254. 7253, 25) 4 --neuron 2 

19 (-30, -30, -30, -30, -30)); --neuron 3 

CO Sere Sie Si Sea ee at ae eae ea a 
21 END package_of_types_and_weights; 

22 “SP re Se sSos Pe se Ge So reset os He te SP STE Se SSS ee See SSeS oe eee Sess 


ds See Package with sigmoidal function: ------------------755rr rrr rr rere 
2 LIBRARY ieee; 

3. USE ieee.std_logic_1164.all; 

4 USE ieee.numeric_std.all; 

5 USE work.package_of_types_and_weights.all; 

6 

7 

8 

9 


PACKAGE package_of_sigmoid IS 
FUNCTION conv_sigmoid (SIGNAL input: SIGNED) RETURN SIGNED; 
END package_of_sigmoid; 


[O° Sssesse ses terSese see Sere seioiSe se See Ree eG aie ieee eS icici 
11 PACKAGE BODY package_of_sigmoid IS 

12 FUNCTION conv_sigmoid (SIGNAL input: SIGNED) RETURN SIGNED IS 

13 VARIABLE a: INTEGER RANGE 0 TO 4**N-1; 

14 VARIABLE b: INTEGER RANGE 0 TO 2**N-1; 

15 BEGIN 

16 a := TO_INTEGER(ABS(input)); 

17 IF (a=0) THEN b:=0; 


18 ELSIF (a>0 AND a<97) THEN b:=2; 
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19 ELSIF (a>=97 AND a<198) THEN b:=6; 

20 ELSIF (a>=198 AND a<305) THEN b:=10; 
21 ELSIF (a>=305 AND a<425) THEN b:=14; 
22 ELSIF (a>=425 AND a<567) THEN b:=18; 
23 ELSIF (a>=567 AND a<753) THEN b:=22; 
24 ELSIF (a>=753 AND a<1047) THEN b:=26; 
25 ELSE b:=30; 

26 END IF; 

27 IF (input(2*N-1)='0') THEN 

28 RETURN TO_SIGNED(b, N); 

29 ELSE 

30 RETURN TO_SIGNED(-b, N); 

31 END IF; 

32 END conv_sigmoid; 

33 END package_of_sigmoid; 

A a a 


1]. oseeeSs Main GOd@s. +6 222s 5 Ss2 sees see Shas Dees SSS See SESS eS Ses See See 
2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all; 

4 USE ieee.numeric_std.all; 

5 USE work.package_of_types_and_weights.al1l; 

6 USE work.package_of_sigmoid.all; 

7 


8 ENTITY neural_net IS 


9 PORT (clk, rst: IN STD_LOGIC; 

10 x: IN SIGNED(N-1 DOWNTO 0); 

11 y: OUT short_array); 

12 END neural_net; 

[3 522s se ose See Se Sse Sessa Seer SSS SR SS ee See ee eS eieheue ee Siem aie 
14 ARCHITECTURE neural_net OF neural_net IS 

15 SIGNAL prod, accl, acc2: long_array; 

16 SIGNAL sigmoid: short_array; 

17 SIGNAL counter: INTEGER RANGE 1 TO weights+1; 

18 BEGIN 

19 ---- Process for counter: ---------------------+----+--- 
20 PROCESS(clk) 

21 BEGIN 

22 IF (clk'"EVENT AND clk='1') THEN 

23 IF (rst='1") THEN 

24 counter <= 1; 

25 ELSE 

26 counter <= counter + 1; 

27 END IF; 

28 END IF; 

29 END PROCESS; 

30 ---- Registers for acc2 and y: -----------r rrr rrr rr rre 
31 PROCESS(clk) 

32 BEGIN 

33 IF (clk'EVENT AND clk='1') THEN 

34 IF (rst="1') THEN 


35 FOR 1 IN 1 TO neurons LOOP 
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36 y(i) <= sigmoid(i); 

37 acc2(i) <= (OTHERS=>'0'); 

38 END LOOP; 

39 ELSE 

40 FOR i IN 1 TO neurons LOOP 

41 acc2(i) <= accl(i); 

42 END LOOP; 

43 END IF; 

44 END IF; 

45 END PROCESS; 

46 ---- Process for combinational units: ---------------- 
47 PROCESS(x, counter) 

48 BEGIN 

49 FOR i IN 1 TO neurons LOOP 

50 prod(i) <= x * TO_SIGNED(weight(i, counter), N); 
51 accl(i) <= prod(i) + acc2(i); 

52 IF ((acc2(i)(2*N-1)=prod(i)(2*N-1)) AND 

53 (accl(i)(2*N-1)/=acc2(i)(2*N-1))) THEN 
54 accl(i) <= ((2*N-1)=>acc2(i)(2*N-1), 

55 OTHERS=>NOT acc2(i)(2*N-1)); 

56 END IF; 

57 sigmoid(i) <= conv_sigmoid(acc2(i)); 

58 END LOOP; 

59 END PROCESS; 

60 END neural_net; 

Gil: Ssh serait eee ee eee sie ee tee a eee ee See Sinise eee Saree eels a cicia says 


22.7 Exercises 


In all exercises below, the VHDL code must be written, compiled, debugged, and then carefully 
simulated. 


1. Tapped delay line 
Using VHDL, design the tapped delay line of Figure 14.2(d). 
2. Shift register with load 


Present two “downgraded” solutions for the SR designed in Section 22.1 with the following 
simplifications: 


a. Still with an arbitrary number of stages (M) but only 1 bit per stage (N=1). 
b. Still with an arbitrary number of bits (N) but a fixed number of stages (M=4). 
Try to develop solutions with and without COMPONENT. 
3. Pseudo-random sequence generator 
Using VHDL, design the pseudo-random sequence generator of Figure 14.30. 
4. Frequency meter 


Using VHDL, design the frequency meter illustrated in Figure 22.9(b). 
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5. Digital PWM 
Using VHDL, design the digital PWM circuit of Exercise 14.36. 


Divide-by-5 with symmetric phase 


Using VHDL, design the divide-by-5 circuit with symmetric phase of Exercise 14.38. 
Timer #1 


Consider the timer that was designed in Section 22.3, shown in Figure E22.7 with additional features. 
Present an improved solution, replacing the enable (ena) and reset (rst) ON-OFF switches with just 
one push button, such that every time the button is pressed (and released) the timer switches its 
state, that is, it stops if it is running, or it resumes running when stopped. If the switch is pressed for 
a time longer than two seconds, then the timer should be reset. When the timer reaches 60, it must 
stop, returning to zero only after a reset occurs. Note in Figure E22.7 that the normal state of the 
ena input is '1' (the push button produces a '0' when pressed). The clock frequency (fclk) should be 
entered using the GENERIC statement. 


Vpp=3.3V 
full_count 


count2 


FIGURE E22.7. 


8. Timer #2 


10. 


Add, to the solution developed for Exercise 22.7 above, a debouncing circuit for the push button. 


Timer #3 


In continuation of the exercise above, modify the full_count output, such that instead of simply 
going to 'l' when full count (that is, 60 seconds) is reached it starts blinking, remaining so until the 
push button is pressed (for more than 2s) to reset the timer. Adopt 2 Hz for the blinking frequency. 


Neural network 


Compile the code in Section 22.6 and perform different simulations to get acquainted with its 
structure. 


Change the values of the weights, then calculate the new values that should be produced in the 
simulations, and finally compile and simulate the code to verify the actual results. 


Make the neuron bigger (note that the constants in lines 79 of the initial package are very help- 
ful for that), then again compile and simulate the code to compare the actual results against your 
predictions. 


Add more neurons to the structure, again fully simulating the design to understand its details. 


Finally, add other layers to it. 


VHDL Design 
of State Machines 


Objective: In this chapter we conclude our series of design examples using VHDL. This chapter 
finalizes the study of sequential circuits, which was initiated in the previous chapter. However, while 
the former showed regular sequential designs, this chapter presents state-machine-based designs. Like the 
previous chapter, this too is closed with a larger example in which an LCD driver is designed. In the next 
chapter, the use of VHDL for simulation instead of synthesis will be presented. 


Chapter Contents 


23.1 String Detector 

23.2 “Universal” Signal Generator 
23.3. Car Alarm 

23.4 LCD Driver 

23.5 Exercises 


23.1 String Detector 


We want to design a circuit that takes a serial stream of ASCII characters (Figure 2.12) as input and out- 
puts a 'l' whenever the sequence "mp3" occurs. As seen in Chapter 15, this is a typical case in which the 
FSM approach is helpful. 

First, the state transition diagram must be drawn, which is shown in Figure 23.1 (this problem is 
similar to that designed in Example 15.5). There are four states, called waiting, first_char (to which the 
FSM should move when "m" is detected), second_char (after "m" and "p" have been detected), and finally 
third_char (after "m," "p," and "3" have been detected). A top-level diagram for the circuit is also included 
in Figure 23.1. 

A VHDL code for this circuit is shown below, which is a direct application of the template seen in 
Section 19.16. The decimal values corresponding to the ASCII characters that must be detected are 
(Figure 2.12) "m"=109, "p"=112, and "3"=51. However, ASCII characters are synthesizable, so instead of 
declaring the inputs as integers we can employ CHARACTER directly (line 7). Note that in VHDL, charac- 
ters are represented with 8 bits. 

The code contains a user-defined enumerated data type (detector_state, lines 12 and 13) to encode the 
states, followed by a declaration of signals (pr_state and nx_state, line 14) that conform to that data type. 
The lower (sequential) section of the FSM is implemented by the process in lines 17-24. Note that a value 
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FIGURE 23.1. String detector ("mp3"). 
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FIGURE 23.2. Simulation results from the VHDL code for the "mp3" string detector of Figure 23.1. 


is assigned to a signal (pr_state, line 22) at the transitions of another signal (clk, line 21), so registers are 
inferred. The upper (combinational) section is implemented using the CASE statement in the process of 
lines 26-62. Simulation results are depicted in Figure 23.2. 


1 
2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all1; 

4 2 a Seuss, a ese. Sian iS, Sheena, Syewe, Shae, ernest eae epsuete a 2 ee oes Sy 
5 ENTITY string_detector is 

6 PORT (clk, rst: IN STD_LOGIC; 

7 ascii_in: IN CHARACTER; --8 bits 

8 string_detected: OUT STD_LOGIC); 

9 END string_detector; 


lO) Sa Sass eerias Ss sieee SSS sores Se setae ee aS ae see S 

11 ARCHITECTURE fsm OF string_detector IS 

12 TYPE detector_state IS (waiting, first_char, 
13 second_char, third_char); 

14 SIGNAL pr_state, nx_state: detector_state; 

15 BEGIN 

V6: -  ~sseress-ers Lower section: ----------7rrrr rrr re 

17 PROCESS (clk, rst) 

18 BEGIN 

19 IF (rst="1') THEN 


20 pr_state <= waiting; 
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21 ELSIF (clk'EVENT AND clk='1") THEN 
22 pr_state <= nx_state; 
23 END IF; 
24 END PROCESS; 
25 9 Sees Upper SeCclionics ssa -oeeesen see ea ss 
26 PROCESS (pr_state, ascii_IN) 
27 BEGIN 
28 CASE pr_state IS 
29 WHEN waiting=> 
30 string_detected <= '0'; 
31 IF (ascii_in='m') THEN --detect 'm' 
32 nx_state <= first_char; 
33 ELSE 
34 nx_state <= waiting; 
35 END IF; 
36 WHEN first_char => 
37 string_detected <= '0'; 
38 IF (ascii_in="p') THEN --detect 'p' 
39 nx_state <= second_char; 
40 ELSIF (ascii_in='m') THEN --detect '‘'m' 
41 nx_state <= first_char; 
42 ELSE 
43 nx_state <= waiting; 
44 END IF; 
45 WHEN second_char => 
46 string_detected <= '0'; 
47 IF (ascii_in='3"') THEN --detect '3' 
48 nx_state <= third_char; 
49 ELSIF (ascii_in='m') THEN --detect '‘'m' 
50 nx_state <= first_char; 
51 ELSE 
52 nx_state <= waiting; 
53 END IF; 
54 WHEN third_char => 
55 string_detected <= '1'; 
56 IF (ascii_in='m') THEN --detect 'm' 
57 nx_state <= first_char; 
58 ELSE 
59 nx_state <= waiting; 
60 END IF; 
61 END CASE; 
62 END PROCESS; 
63 END fsm; 
64, 222 sess eee sere eee ee Se See eee eee ese See Sees 


23.2 


“Universal” Signal Generator 


A generic approach for the construction of any binary waveform was introduced in Section 15.7. As 
shown in Figure 23.3(a) (copied from Figure 15.34(c)), it consists of two multiplexers, controlled by 
two FSMs that operate at different clock edges, where the first FSM-MUX pair generates the desired 
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FIGURE 23.3. Signal generator. 


waveform and the second FSM-MUX pair eliminates any glitches. We want to design the signal gen- 
erator of Example 15.10 using the same technique, but now with VHDL. The signal to be produced 
(y) is shown in Figure 23.3(b). Note that the system resolution must be maximum (that is, one-half 
of a clock cycle—Section 15.3), or, in other words, the circuit must operate at both clock edges. 

Recall from Example 15.10 that MUX1 must select the sequence {'l' > clk — '1'}, while MUX2 must 
choose {x —> 'l' > x}. Recall also that the machines must be synchronized, which, as explained in 
Section 15.6, can be achieved with the interconnection nx_state2 =pr_statel1 when using two machines, 
or pr_state2=pr_statel when using a quasi-single machine. The former option is adopted here, while 
the latter is treated in Exercise 23.11 (though the latter solution might save some flip-flops sometimes, 
it lacks the elegance of the former). 

A VHDL code for this problem is shown below. It contains two FSMs, each designed according to 
the template in Section 19.16. The lower (sequential) section of FSM1 is in lines 14-19, while its upper 
(combinational) section, which implements the multiplexer, is in lines 21-34. For FSM2, the lower section 
is in lines 36-41, while the upper section is in lines 43-54. The first machine operates at the negative clock 
edge, whereas the second operates at the positive clock transition. Synchronism between FSM1 and 
FSM2 is established in line 45. Note in line 4 that the signal x was brought outside, so the occurrence of 
glitches can be checked in the simulations. 


1 Sere eeecoeste SoeSe ete oe sees Sees eee oe 
2 ENTITY signal_generator IS 

3 PORT (clk: IN BIT; 

4 x: BUFFER BIT; 

5 y: OUT BIT); 

6 END signal_generator; 

7 i A a pe ea as i ey a fr yh, ee Sra care ene Se ete Se 
8 ARCHITECTURE fsm OF signal_generator IS 
9 TYPE state IS (A, B, C); 

10 SIGNAL pr_statel, nx_statel: state; 
11 SIGNAL pr_state2, nx_state2: state; 
12 BEGIN 

13  Bumisrave Lower section of FSM1: -----------------5r rr rrr ree 
14 PROCESS (clk) 


15 BEGIN 
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16 IF (clk"EVENT AND clk='0") THEN 
17 pr_statel <= nx_statel; 
18 END IF; 
19 END PROCESS; 
20: =#es= Upper section of FSM1: (MUX1 included): --------------------- 
21 PROCESS (pr_statel, clk) 
22 BEGIN 
23 CASE pr_statel IS 
24 WHEN A => 
25 x <= "1's 
26 nx_statel <= B; 
27 WHEN B => 
28 x <= clk; 
29 nx_statel <= C; 
30 WHEN C => 
31 x <= 'L'; 
32 nx_statel <= A; 
33 END CASE; 
34 END PROCESS; 
35 sss Lower section of FSM2: ----------- rrr rrr rrr rrr rrr rrr rrr rcs 
36 PROCESS (clk) 
37 BEGIN 
38 IF (clk"EVENT AND clk='1") THEN 
39 pr_state2 <= nx_state2; 
40 END IF; 
41 END PROCESS; 
420 2 ----- Upper section of FSM2: (MUX2 included): --------------------- 
43 PROCESS (pr_statel, pr_state2, x) 
44 BEGIN 
45 nx_state2 <= pr_statel; --synchronism 
46 CASE pr_state2 IS 
47 WHEN A => 
48 y <x; 
49 WHEN B => 
50 y <= 'l'; 
51 WHEN C => 
52 y <= X;3 
53 END CASE; 
54 END PROCESS; 
55 END fsm; 
BG: Sas esa Se ease Ser aes Spe oa ee i Se Se eS ate eS SSS a ce eee 


Simulation results are displayed in Figure 23.4. As expected, a glitch does occur in x when MUX1 
commutes from clk to '1'. This glitch is suppressed by the second FSM-MUxX pair, resulting in a clean 


signal for y. 


To conclude, the reader is invited to compile this code and check if the number of DFFs inferred by the 
compiler matches that in our previous solution of Figure 15.37(e) (do not forget to set the encoding style 
to minimal bits or binary sequential). 
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FIGURE 23.4. Simulation results from the signal generator of Figure 23.3, showing glitches in the intermediate 
signal (x) during state transitions from clk to '1', but a clean signal at the output (y). 
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FIGURE 23.5. Car alarm. 


23.3 Car Alarm 


A car alarm was described in Exercise 15.17, and its top-level diagram is repeated in Figure 23.5(a). As 
indicated, it contains four inputs (remote, sensors, rst, clk) and one output (siren). The corresponding FSM 
should have at least three states, called disarmed, armed, and intrusion, as illustrated in Figure 23.5(b). If 
remote='1' occurs, the system must change from disarmed to armed or vice versa depending on its current 
state. If armed, it must change to intrusion when sensors ='1' happens, thus activating the siren (siren ='1'). 
When in intrusion, it can only be deactivated (returning to disarmed) by another remote ='1' command. 

The design of this car alarm will be shown in three progressive levels. 


@ Case 1: Basic alarm 
m@ Case 2: Alarm with debounced inputs 


m Case 3: Alarm with debounced inputs and ON/OFF chirps 


Case 1 Basic alarm 


The machine of Figure 23.5(b) will be designed in this case. Note, however, that it must be improved to 
fix a major flaw because it does not require remote to go to '0' before being valid again. Consequently, 
when the system changes from disarmed to armed, it starts flipping back and forth between these two 
states if a long remote='1' command is given (one that lasts several clock cycles). This is also a problem 
when turning the alarm off. 

The machine of Figure 23.5(b) can be fixed by introducing intermediate (temporary) states in which 
the system waits until remote='0' occurs. This alternative is depicted in Figure 23.6(a), with the waiting 
states (wait1 and wait2) represented with a lighter color. 

Another solution is to use some kind of flag that monitors the signal remote to make sure that only 
after it returns to zero a new state transition is allowed to occur. Such an alternative is depicted in 
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FIGURE 23.6. Two alternatives to fix the machine of Figure 23.5(b): (a) With the introduction of intermediate 
states (lighter color) where the system waits until remote='0' occurs; (b) With the use of a flag whose value 


changes when remote ='0' happens. 
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FIGURE 23.7. Simulation results from the VHDL code for the FSM of Figure 23.6(b). 


Figure 23.6(b). In this case, the flag can be constructed using a toggle flip-flop (TFF), which changes its 


output value every time remote goes to zero. 


A VHDL code for the option shown in Figure 23.6(b) is shown below. It contains two processes that 
implement the two classical FSM sections in lines 27-34 (lower section) and 36-63 (upper section). A third 
process (lines 18-25) is employed to create the flag. The latter is simply a TFF, which causes the flag’s 
value to flip when the remote control button is released (that is, when remote returns to zero). Simulation 


results from the circuit inferred from this code are shown in Figure 23.7. 


1 

2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all1; 

4 Soe bo eee eee ce aie exe aie ated Sate Se ei des aude 6 a ele ee se eho ee ee ee ee ee 
5 ENTITY car_alarm IS 

6 PORT (clk, rst, remote, sensors: IN STD_LOGIC; 

7 siren: OUT STD_LOGIC); 

8 END car_alarm; 

9 
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10 ARCHITECTURE fsm OF car_alarm IS 


11 


TYPE alarm_state IS (disarmed, armed, intrusion); 

ATTRIBUTE enum_encoding: STRING; 

ATTRIBUTE enum_encoding OF alarm_state: TYPE IS "sequential"; 
SIGNAL pr_state, nx_state: alarm_state; 

SIGNAL flag: STD _LOGIC; 


BEGIN 


Sees Pe Di GiQ se 5S SS i Sa SS i CS ReGen 
PROCESS (remote, rst) 
BEGIN 
IF (rst='"1') THEN 
flag <= '0'; 


ELSIF (remote'EVENT AND remote='0') THEN 
flag <= NOT flag; 
END IF; 
END PROCESS; 
Sue Sie LOWEN “SECETONG Sess 5sseceescs se eee ses se eae Se ees S ets 
PROCESS (clk, rst) 
BEGIN 
IF (rst='1') THEN 
pr_state <= disarmed; 
ELSIF (clk'EVENT AND clk='1") THEN 
pr_state <= nx_state; 
END IF; 
END PROCESS; 
alae Upper SectiOny “se s2ssseossese cena cere eee acer eee seen 
PROCESS (pr_state, flag, remote, sensors) 
BEGIN 
CASE pr_state IS 
WHEN disarmed => 
siren <= '0'; 
IF (remote='1' AND flag='0") THEN 
nx_state <= armed; 
ELSE 
nx_state <= disarmed; 
END IF; 
WHEN armed => 
siren <= '0'; 
IF (sensors='1') THEN 
nx_state <= intrusion; 
ELSIF (remote='1' AND flag='1") THEN 
nx_state <= disarmed; 
ELSE 
nx_state <= armed; 
END IF; 
WHEN intrusion => 
siren <= '1'; 
IF (remote='1' AND flag='1") THEN 
nx_state <= disarmed; 
ELSE 
nx_state <= intrusion; 
END IF; 
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62 END CASE; 

63 END PROCESS; 

64 END fsm; 

65 ~~ nnn nnn nnn eee eee eee 


Case 2 Alarm with debounced inputs 


To protect the system against noise, for any input signal transition to be considered as valid the signal 
must remain in the new value for at least a certain amount of time (for example, 5 ms). In other words, the 
signals remote and sensors must be “debounced.” 

This type of procedure (debouncing) was already seen in Section 22.2, so the construction of the new code 
is straightforward. It is shown below, where two additional processes are employed for debouncing remote 
(lines 21-37) and sensors (lines 39-55). The debounced signals are called delayed_remote and delayed_sensors. 

Note also in line 6 that the desired debouncing time interval is entered using the GENERIC statement (the 
corresponding number of clock cycles is actually specified), so the code can be easily adjusted to any clock 
frequency. A small value was adopted for this parameter in the simulations to ease the visualization of the 
results, which are depicted in Figure 23.8. Because debounce=3 was used, only inputs (remote and sensors) 
lasting three clock edges or longer are considered (that is, transferred to delayed_remote and delayed_sensors). 


1 
2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all; 

4 Beeb eens ee aerate ec eres seca cnet ene ebaosns eee eee eet eoce ec eeeceee eee ees 
5 ENTITY car_alarm IS 

6 GENERIC (debounce: INTEGER:=3); --number clock pulses debouncer 

7 PORT (clk, rst, remote, sensors: IN STD_LOGIC; 
8 siren: OUT STD_LOGIC); 
9 END car_alarm; 


A is aS at ia aaa eee aie i er ete 
11 ARCHITECTURE fsm OF car_alarm IS 

12 TYPE alarm_state IS (disarmed, armed, intrusion); 

13 ATTRIBUTE enum_encoding: STRING; 

14 ATTRIBUTE enum_encoding OF alarm_state: TYPE IS "sequential"; 

15 SIGNAL pr_state, nx_state: alarm_state; 

16 SIGNAL delayed_remote: STD_LOGIC; 


Ops 80.0ns 1600ns 2400ns 320.0ns 4000ns 480.0ns 5600ns 6400ns 7200ns 8000ns 880.0ns 960,0ns 
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FIGURE 23.8. Simulation results obtained from the VHDL code for the FSM of Figure 23.6(b) with debouncers 
included. 
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17 SIGNAL delayed_sensors: STD_LOGIC; 

18 SIGNAL flag: STD _LOGIC; 

19 BEGIN 

20, ‘sR eRe eee Deboucer for 'remote': -------------557 rrr rrr rrr 
21 PROCESS (clk, rst) 

22 VARIABLE count: INTEGER RANGE 0 TO debounce; 
23 BEGIN 

24 IF (rst='1') THEN 

25 count := 0; 

26 ELSIF (clk"EVENT AND clk='0") THEN 

27 IF (delayed_remote /= remote) THEN 

28 count := count + 1; 

29 IF (count=debounce) THEN 

30 count := 0; 

31 delayed_remote <= remote; 

32 END IF; 

33 ELSE 

34 count := 0; 

35 END IF; 

36 END IF; 

37 END PROCESS; 

380  #88sese Deboucer for 'sensors': ------77-c rrr rrr rrr rrr re 
39 PROCESS (clk, rst) 

40 VARIABLE count: INTEGER RANGE 0 TO debounce; 
41 BEGIN 

42 IF (rst='1") THEN 

43 count := 0; 

44 ELSIF (clk'EVENT AND clk='0") THEN 

45 IF (delayed_sensors /= sensors) THEN 

46 count := count + 1; 

47 IF (count=debounce) THEN 

48 count := 0; 

49 delayed_sensors <= sensors; 

50 END IF; 

51 ELSE 

52 count := 0; 

53 END IF; 

54 END IF; 

55 END PROCESS; 

5600 sSHeecrss Flagi s2s2ssssssseser ees ece essere reese Ser 
57 PROCESS (delayed_remote, rst) 

58 BEGIN 

59 IF (rst='"1') THEN 

60 flag <= '0'; 

61 ELSIF (delayed_remote'EVENT AND delayed_remote='0"') THEN 
62 flag <= NOT flag; 

63 END IF; 

64 END PROCESS; 

65 -- SSS ee es Lower section Of FSM: ---------- rrr rrr rrr rrr rrr rrr crc rene 
66 PROCESS (clk, rst) 

67 BEGIN 


68 IF (rst='"1') THEN 
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69 pr_state <= disarmed; 

70 ELSIF (clk'EVENT AND clk='1') THEN 

71 pr_state <= nx_state; 

72 END IF; 

73 END PROCESS; 

T4- 9 a= Upper section Of SM: =e s9sses-n- see seer Serene sors e Rea ee 
75 PROCESS (pr_state, flag, delayed_remote, delayed_sensors) 
76 BEGIN 

77 CASE pr_state IS 

78 WHEN disarmed => 

79 siren <= '0'; 

80 IF (delayed_remote='1" AND flag='0') THEN 

81 nx_state <= armed; 

82 ELSE 

83 nx_state <= disarmed; 

84 END IF; 

85 WHEN armed => 

86 siren <= '0'; 

87 IF (delayed_sensors='"1"') THEN 

88 nx_state <= intrusion; 

89 ELSIF (delayed_remote='1' AND flag='"1") THEN 
90 nx_state <= disarmed; 

91 ELSE 

92 nx_state <= armed; 

93 END IF; 

94 WHEN intrusion => 

95 siren <= 'l'; 

96 IF (delayed_remote='1" AND flag='1') THEN 

97 nx_state <= disarmed; 

98 ELSE 

99 nx_state <= intrusion; 

100 END IF; 

101 END CASE; 

102 END PROCESS; 

103 END fsm; 

104: S222 2SSSS Sess Se ese a eae Se eS aS ea ae ee ee ee ae 


Case 3 Alarm with debounced inputs and ON/OFF chirps 


This is the most complete implementation. Besides the basic circuit plus the debouncers, chirps are added 
to the system. When the alarm is activated, the siren must emit one chirp (with duration chirpON ~ 200 ms), 
while during deactivation it must produce two chirps (with separation chirpOFF ~300ms). 

One alternative to implement this circuit is shown in Figure 23.9(a). Note that this FSM contains five 
additional states (chirp1 to chirp5) when compared to the original FSM of Figure 23.5(b). Assuming that 
the circuit is in the disarmed state, the occurrence of remote ='1' turns it ON. However, before reaching the 
armed state, it must go through the chirp1 state, which turns the siren ON and lasts chirpON clock cycles. 
When in the armed state, the occurrence of sensors ='1' moves the system to the intrusion state in which the 
siren is turned ON and remains so until a command to disarm the alarm (remote='1') is provided. Note 
the sequence of chirp states that the system must go through during the turn-off procedure, some with 
the siren ON during chirpON clock cycles, others with it OFF during chirpOFF clock periods. 
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FIGURE 23.9. State transition diagrams for a car alarm with ON/OFF chirps: (a) Basic circuit with chirps 
included, but with a loop problem when a long remote='1' command occurs; (b) Machine fixed using inter- 
mediate states (lighter color) that put the system on hold until remote='0' occurs; (c) Machine fixed using a 
flag that also monitors the occurrence of remote='0'. 


Similar to the original FSM of case 1 (Figure 23.5(b)), the machine of Figure 23.9(a) also exhibits a loop 
problem when a long remote='1' command is given (it causes the system to circulate in the loop formed 
by the six leftmost states). To fix it, any of the two alternatives described in case 1 can be used. The option 
with additional states (wait1 and wait2), which put the system on hold until remote ='0' occurs, is depicted 
in Figure 23.9(b), while the option using a flag (also monitoring the remote ='0' condition) is presented in 
Figure 23.9(c). 

A VHDL code for the alternative illustrated in Figure 23.9(c) is shown below. Note that it also includes 
debouncing procedures for both input signals (remote and sensors), hence resulting a very robust (and 
complete) alarm implementation. The time-related parameters are specified using again the GENERIC 
attribute (lines 6-9), so the code can be easily adapted to any clock frequency. Small values were 
employed in the simulations to ease the visualization of the results, shown in Figure 23.10. 
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Ops 80.0ns 1600ns 2400ns 320.0ns 4000ns 4800ns S600ns 6400ns 7200ns 800.0ns 880.0ns 9600ns 


Name 0 ps 
tst 
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temote Parte) h Pol 
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sensors 
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FIGURE 23.10. Simulation results from the circuit inferred with the VHDL code relative to the complete car 
alarm of Figure 23.9(c), with debouncers and ON/OFF chirps included. 


1 
2 LIBRARY ieee; 

3 USE ieee.std_logic_1164.all1; 
4 


5 ENTITY car_alarm IS 

6 GENERIC (debounce: INTEGER := 3; --number clock pulses debouncer 
7 chirpON: INTEGER := 2; --number clock pulses chirp=ON 
8 chirpOFF: INTEGER := 2; --number clock pulses chirp=OFF 
9 max: INTEGER := 2); --largest of chirpON, chirpOFF 

10 PORT (clk, rst, remote, sensors: IN STD_LOGIC; 

11 siren: OUT STD_LOGIC); 

12 END car_alarm; 

UD: ese SSeS eae eS ae ee ee Se eee ee es 
14 ARCHITECTURE fsm OF car_alarm IS 

15 TYPE alarm_state IS (disarmed, armed, intrusion, chirpl, chirp2, 

16 chirp3, chirp4, chirp5); 

17 ATTRIBUTE enum_encoding: STRING; 

18 ATTRIBUTE enum_encoding OF alarm_state: TYPE IS "sequential"; 

19 SIGNAL pr_state, nx_state: alarm_state; 

20 SIGNAL delayed_remote: STD_LOGIC; 

21 SIGNAL delayed_sensors: STD_LOGIC; 

22 SIGNAL flag: STD_LOGIC := '0'; 

23, SIGNAL timer: INTEGER RANGE 0 TO max; 

24 BEGIN 

25. Seem Sees Debouncer for ‘remote': -------------------- 

26 PROCESS (clk, rst) 

27 VARIABLE count: INTEGER RANGE O TO max; 

28 BEGIN 

29 IF (rst="1') THEN 

30 count := 0; 

31 ELSIF (clk'EVENT AND clk='0") THEN 

32 IF (delayed_remote /= remote) THEN 

33 count := count + 1; 

34 IF (count=debounce) THEN 

35 count := 0; 


36 delayed_remote <= remote; 
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37 END IF; 

38 ELSE 

39 count := 0; 

40 END IF; 

41 END IF; 

42 END PROCESS; 

43, *65sesce< Debouncer for ‘sensors': ---------7777r rrr rH 
44 PROCESS (clk, rst) 

45 VARIABLE count: INTEGER RANGE O TO max; 

46 BEGIN 

47 IF (rst='1') THEN 

48 count := 0; 

49 ELSIF (clk"EVENT AND clk='0") THEN 

50 IF (delayed_sensors /= sensors) THEN 

51 count := count + 1; 

52 IF (count=max) THEN 

53 count := 0; 

54 delayed_sensors <= sensors; 

55 END IF; 

56 ELSE 

57 count := 0; 

58 END IF; 

59 END IF; 

60 END PROCESS; 

61 jjé§ sSHeecex< Plage -2s=*esSseesrcee res ee eee terse pes Sees 
62 PROCESS (delayed_remote, rst) 

63 BEGIN 

64 IF (rst="1') THEN 

65 flag <= '0'; 

66 ELSIF (delayed_remote'EVENT AND delayed_remote='0") THEN 
67 flag <= NOT flag; 

68 END IF; 

69 END PROCESS; 

7Q (3e2ceese Lower section of FSM: -------------5575 rrr rrr rere 
71 PROCESS (clk, rst) 

Te VARIABLE count: INTEGER RANGE O TO max; 

73 BEGIN 

74 IF (rst="1') THEN 

75 pr_state <= disarmed; 

76 ELSIF (clk"EVENT AND clk='1") THEN 

77 count := count + 1; 

78 IF (count=timer) THEN 

79 pr_state <= nx_state; 

80 count := 0; 

81 END IF; 

82 END IF; 

83 END PROCESS; 

B84 seep eces Upper Section. Of FSM: s22--eesse+ = seecseee ose series 
85 PROCESS (pr_state, flag, delayed_remote, delayed_sensors) 
86 BEGIN 


87 CASE pr_state IS 
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WHEN disarmed => 
siren <= '0'; 
timer <= 1; 
IF (delayed_remote='1" AND flag='0') THEN 
nx_state <= chirpl; 
ELSE 
nx_state <= disarmed; 
END IF; 
WHEN chirpl => 
siren <= 'l'; 
timer <= chirpON; 
nx_state <= armed; 
WHEN armed => 
siren <= en 
timer <= 1; 
IF (delayed_sensors='1") THEN 
nx_state <= intrusion; 
ELSIF (delayed_remote='1" AND flag='1') THEN 
nx_state <= chirp3; 
ELSE 
nx_state <= armed; 
END IF; 
WHEN intrusion => 
siren <= 'l'; 
timer <= 1; 
IF (delayed_remote='1"' AND flag='1') THEN 
nx_state <= chirp2; 
ELSE 
nx_state <= intrusion; 
END IF; 
WHEN chirp2 => 
siren <= '0'; 
timer <= chirpOFF; 
nx_state <= chirp3; 
WHEN chirp3 => 
siren <= '1'; 
timer <= chirpON; 
nx_state <= chirp4; 
WHEN chirp4 => 
siren <= '0'; 
timer <= chirpOFF; 
nx_state <= chirp5; 
WHEN chirp5 => 
siren <= '1'; 
timer <= chirpON; 
nx_state <= disarmed; 
END CASE; 
END PROCESS; 
END fsm; 
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23.4 LCD Driver 


Similar to Chapter 22, this chapter also closes with a longer design. 


Figure 23.11(a) shows a popular LCD (liquid crystal display) that contains two lines of 16 characters 
each. This kind of display is normally sold with an LCD controller installed on the back, responsible for 
driving the display’s dots. Such a controller normally is the HD44780U (Hitachi) or, equivalently, the 
KS0066U (Samsung), whose pinout is listed in Figure 23.11(b), with pins 15 and 16 used only when the 
LCD is fabricated with backlight. 


LCD controller 


To use an LCD, the first step is to understand the LCD controller. Looking at the pinout in Figure 23.11(b), 


we observe that, besides power, the following four signals must be sent to the controller: 


@ RS (register select): '0' selects the controller’s instruction register, while '1' selects its data register 


(the latter is for characters to be displayed in the LCD). 


mg R/W- (read/write-): If '0', the next E (enable) pulse will cause the present instruction or data to 
be written into the controller’s register selected by RS, while '1' causes data to be read from the 


controller’s register. 


m@ DB (data bus): 8-bit bus whose content (data or instruction) is written into the controller’s register at the 
0', or through which data is read from the controller’s register if R/W-='1'. 


next pulse of E if R/W-=' 


(a) 


Power in 
Analog in 
= 


Read/Write (1: Read from display, 0: Write to display) 


Enable read/write 
Data (LSB) 


In-Out 
In-Out 


Backlight anode (+) (optional) 
Backlight cathode (-) (optional) 


FIGURE 23.11. (a) 16x2 LCD; (b) Corresponding pinout (pins 15 and 16 are optional). 
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m E (enable): Must be pulsed high to write anything into the controller’s register (the actual writing 
occurs at the negative edge of E). A simplified timing diagram for E is shown in Figure 23.12. 


The signals above are sent to the controller. The main signal received from the controller is described below. 


m@ BusyFlag: This signal is provided by the controller through bit DB(7) of the data bus, with '1' indicating 
that the controller is busy. In practice, the use of this signal is normally avoided by simply adopting 
for the instructions a time separation longer than the maximum required for the instructions to be 
completed (listed later in the table of Figure 23.13). If so, R/W- can be kept permanently low. 


The controller’s instruction set is shown in Figure 23.13 along with explanatory comments. Even though 
their usage will be illustrated in a design example ahead, a summary of their main features follows. 


m Display clear with reset of memory address (Clear Display) or only reset of memory address 
(Return Home). 


m@ Increment or decrement of display position (Entry Mode Set), plus individual shift or not for 
display and cursor (Cursor or Display Shift). 


m@ Individual choice of display, cursor, and blink ON/OFF modes (Display ON/OFF Control). 


m Operation with 4- or 8-bit bus with one line of 5 x8- or 5x 10-dot characters or with two lines of 5 x 8-dot 
characters (Function Set). 


m 7 bits for display character addressing (Set DD RAM Address), allowing individual access to 128 
LCD positions, divided into two rows of 64 characters each. The address of the first character in the 
first row is 0, while the address of the first character in the second row is 64, regardless of the actual 
number of characters in the LCD (hence accommodating LCDs as big as 64 x 2). 


Data 


Data 


teycte 


Minimum Minimum 
40ns 40 ns 
10 ns 


160 ns (max) 
ts 5 ns 


(b) 


FIGURE 23.12. Simplified LCD controller (HD44780U or KSO066U) timing diagram for (a) write and (b) read 
operations. 
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Figure 23.13 also shows the maximum execution times for all instructions, which are for the controller’s 
internal oscillator operating at 270kHz. This frequency is set by an external resistor between 75 kQ (for 
Vpp=3 V) and 91 kQ (for Vpp=5V). If a different frequency is employed (with different resistor values 
the range that can be covered is roughly 100 kHz-500 kHz), then the execution times must be multiplied 


by 270 kHz / Te oscillator’ 


One important design consideration is the controller’s initialization procedure, which can be done in 
two ways: automatically at power up or by instructions. The latter can be used when the power supply 


conditions for automatic initialization are not met or for safety. It consists of the following: 


1. 


Turn the power ON. 

Wait >15ms after Vpp rises to 4.5 V (or > 40 ms after 2.7 V). 

Execute instruction “Function set” (37 ws) with RW ='0', RS='0', DB="0011XXXx". 
Wait >4.1 ms. 

Execute instruction “Function set” (37 ws) with RW ='0', RS='0', DB="0011XXXx". 
Wait > 100 ws. 


1) Clear Display 


2) Return Home eal 


3) Entry Mode Set 


time (* 
ca 00000001 Clears display and sets DD RAM 
address to zero. 


0000001X Returns display to origin and sets DD 1.52ms 
X=don't care RAM address to zero. 


0000011DS Sets cursor direction and display shift 
during read and write. 
/D=1 increment DD RAM address, 
=0 decrement 
S=1 shift display, =O do not shift 


B=1 blink char., =0 do not blink 


0001S/CR/LXX_ | Moves cursor or display without 
changing DD RAM contents. 
S/C=1 shift display, =O shift cursor 
R/L=1 shift to right, =0 shift to left 

001DLNFXX Sets bus size, number of lines, and 

digit size (font). 
DL=1 8-bit bus, =0 4-bit bus 
N=0 1-line operation, =1 2-line 
F=0 5x8 dots, =1 5x10 dots 


7) Set CG RAM Address | ee | 01AAAAAA Sets CG RAM address to AAAAAA 


5) Cursor or Display Shift 


6) Function Set 


4) Display ON/OFF 00001DCB D=1 display on, =0 off 
Control C=1 cursor on, =0 off 


8) Set DD RAM Address 1AAAAAAA Sets DD RAM address to AAAAAAA 
9) Read Busy Flag and BFAAAAAAA Reads busy flag and address counter 
Address 
10) Write Data to CG or 1 DDDDDDDD Writes data into DD RAM or CG RAM 
DD RAM (defined by last DD or CG RAM 
address set 
41) Read Data from CG eile | DDDDDDDD Lrrghes is oo ony a ae 
efined by las or 
alata address set) 


(*) For 270 kHz internal oscillator; for other frequencies (100 to 500 kHz), multiply time given by 270 kHz/foscillator. 


FIGURE 23.13. LCD controller (HD44780U or KSO066U) instruction set. 
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7. Execute instruction “Function set” (37 ws) with RW ='0', RS='0', DB="0011XXXxX". 
8. Execute instruction “Function set” (37 ws) with RW ='0', RS='0', DB="0011NFXxX" (choose N and F). 
9. Execute instruction “Display on/off control” (37 us) with RW ='0', RS='0', DB="00001000". 

10. Execute instruction “Clear display” (1.52ms) with RW ='0', RS='0', DB="00000001". 


11. Execute instruction “Entry mode set” (37 ys) with RW ='0', RS='0', DB="000001 I/D S" (choose I/D 
and S). 


(Note: The initialization procedure for KSO066U is slightly simpler—consult data sheets for details). 


Design example 


We present next the design of a circuit that interfaces with a 16 x 2 LCD (equipped with an HD44780U 
or KS0066U controller) to have it display the word “VHDL.” The LCD is set to operate in 2-line 8-bit bus 
mode, covering the following three cases: 


m@ Case 1: With “VHDL” written in the first four positions of the first line (Figure 23.14). 


m@ Case 2: Again with “VHDL” written in the first four positions, like above, but blinking with a 
frequency of 2 Hz (twice per second). 


B Case 3: With “VH” written in the first two positions of the first line and “DL” in the first two posi- 
tions of the second line. 


Case 1 With “VHDL” written in the first line 


The FSM approach (Chapter 15 and Section 19.16) was employed to design this circuit, whose state 
transition diagram is presented in Figure 23.15. Note that it includes the initialization by instructions 
(just in case the controller has not been properly initialized at power up), which consists of all steps listed 
earlier, performed by the states shown on the left of Figure 23.15 (from FunctionSet1 to EntryMode). A reset 
by hardware was also included (shown on the right of Figure 23.15), which consists of an RC circuit witha 
time constant of 390ms, causing rst to be momentarily high every time the circuit is powered up. A circuit 
with a potentiometer for contrast adjustment (to be connected to pin 3 of the LCD controller) was also 
included on the right of Figure 23.15. 

As mentioned above, the initialization and setup procedure consists of seven states, which are shown 
on the left of Figure 23.15. The initial four states (FunctionSet1-4) initialize the controller in which the LCD 
is set to operate with an 8-bit bus and 2-line mode, with 5 x 8-dot characters. The fifth state (ClearDisplay) 
causes the display to be cleared and the memory address to be zeroed (cursor returns to the beginning 
of line 1). In the sixth state (DisplayControl) the display is turned ON, while the cursor and blink are kept 
OFF. Finally, in the seventh state (EntryMode), the RAM address is set to increment mode. Note that in all 


VHDL 


FIGURE 23.14. Text to be displayed by the LCD in case 1 of the design example. 
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seven states RS is kept low to select the instruction register, and so is R/W-, because information must 
always be written into and never read from the controller (at the next E pulse). 

The five FSM states shown on the right of Figure 23.15 are where the task proper is executed. In the 
first four, the characters 'V', 'H','D', and 'L' are written into the data register when E is pulsed high (recall, 
however, that the actual writing occurs at the negative edge of E). During these four states, RS is kept high, 
thus selecting the data register. The corresponding values for DB are obtained from the controller’s data 
sheet, that is, 'V'="01010110", 'H'="01001000", 'D'="01000100", and 'L'="01001100". Finally, in the last 
state (ReturnHome), the memory address is zeroed (that is, the cursor returns to the beginning of line 1), 
but without clearing the display, so the characters are overwritten. 

A low-frequency clock (500 Hz) can be used to move the FSM from one state to another, in which case 
the busy-flag bit does not need to be checked because then every instruction will have 2ms to complete, 
which is more than any execution time listed in the last column of Figure 23.13. This clock can play 
the role of E (Enable), which must be pulsed high to write any instruction or data into the controller. 
However, because the actual writing occurs at the negative edge of E, the machine must move from 
one state to another at the positive transition of E, such that RS, R/W-, and DB will be firmly available 
(ready) when E’s negative edge occurs. 

The corresponding VHDL code is shown below. The only inputs are clk and rst, while the outputs are RS, 
RW, E, and DB. Note that a generic parameter, called clk_divider, was declared in line 3, with a value of 50,000 
in this example, because the system clock was assumed to be 25MHz, which must therefore be divided 
by 50K to attain the desired 500Hz clock for the FSM (and signal E). The enumerated data type (called 
state) for the FSM was created in lines 11-13, containing exactly the same names shown in the diagram of 
Figure 23.15. Three processes were employed in the architecture. The first (lines 17-27) are responsible for 
generating the 500 Hz clock, while the other two implement the FSM proper. The lower section of the FSM 
(which contains the flip-flops) is in lines 29-38, while the upper section (combinational logic) is in lines 
40-92. Note that the last process is a direct translation of the state transition diagram of Figure 23.15. 


2 ENTITY lcd_driver IS 

3 GENERIC (clk_divider: INTEGER := 50000); --25MHz to 500Hz 
4 PORT (clk, rst: IN BIT; 

5 RS, RW: OUT BIT; 

6 E: BUFFER BIT; 

7 DB: OUT BIT_VECTOR(7 DOWNTO 0O)); 

8 END lcd_driver; 


9 Sena Sia ec ate = Ste Sars ere. os Suse eta S1e ee S ye are am, Ses Sareea eS ee See ee oe. Skee eee See 
10 ARCHITECTURE Icd_driver OF lcd_driver IS 

11 TYPE state IS (FunctionSetl, FunctionSet2, FunctionSet3, 

12 FunctionSet4, ClearDisplay, DisplayControl, EntryMode, 

13 WriteDatal, WriteData2, WriteData3, WriteData4, ReturnHome); 
14 SIGNAL pr_state, nx_state: state; 

15 BEGIN 

16, Sst Clock generator (E->500HzZ): ------------- 

17 PROCESS (clk) 

18 VARIABLE count: INTEGER RANGE 0 TO clk_divider; 

19 BEGIN 

20 IF (clk"EVENT AND clk='1') THEN 

21 count := count + 1; 

22 IF (count=clk_divider) THEN 

23 E <= NOT E; 


24 count := 0; 
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FunctionSet1 
RS=0 RW=0 
DB=00111000 


rst 


FunctionSet2 
RS=0 RW=0 
DB=00111000 


FunctionSet3 
RS=0 RW=0 
DB=00111000 


WriteData1 
RS=1 RW=0 
DB=01010110 


FunctionSet4 
RS=0 RW=0 
DB=00111000 


WriteData2 
RS=1 RW=0 
DB=01001000 


WriteData3 
RS=1 RW=0 
DB=01000100 


ClearDisplay 
RS=0 RW=0 
DB=00000001 


contrast 
20k (pin 3) 


DisplayControl 
RS=0 RW=0 
DB=00001100 


WriteData4 
RS=1 RW=0 
DB=01001100 


rst 


EntryMode 
RS=0 RW=0 
DB=00000110 


ReturnHome 
RS=0 RW=0 
DB=10000000 


39k 


FIGURE 23.15. FSM for a circuit that writes the word “VHDL” on a 16x 2 LCD (display contrast and FSM reset 
circuits are also shown). 


25 END IF; 

26 END IF; 

27 END PROCESS; 

26.0 Ses Lower section of FSM: -------------------- 
29 PROCESS (E) 

30 BEGIN 

31 IF (E'EVENT AND E='1') THEN 

32 IF (rst="1"') THEN 

33 pr_state <= FunctionSet1; 
34 ELSE 

35 pr_state <= nx_state; 

36 END IF; 

37 END IF; 


38 END PROCESS; 
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39 she Upper section of FSM: -------------------- 
40 PROCESS (pr_state) 

41 BEGIN 

42 CASE pr_state IS 

43 WHEN FunctionSetl => 

44 RS<="0"; RW<='0'; 

45 DB <= "00111000"; 

46 nx_state <= FunctionSet2; 
47 WHEN FunctionSet2 => 

48 RS<="0"'; RW<='0'; 

49 DB <= "00111000"; 

50 nx_state <= FunctionSet3; 
51 WHEN FunctionSet3 => 

52 RS<="0"; RW<='0'; 

53 DB <= "00111000"; 

54 nx_state <= FunctionSet4; 
55 WHEN FunctionSet4 => 

56 RS<="0"'; RW<='0'; 

57 DB <= "00111000"; 

58 nx_state <= ClearDisplay; 
59 WHEN ClearDisplay => 

60 RS<="0"; RW<='0'; 

61 DB <= "00000001"; 

62 nx_state <= DisplayControl; 
63 WHEN DisplayControl => 

64 RS<="0"'; RW<='0'; 

65 DB <= "00001100"; 

66 nx_state <= EntryMode; 

67 WHEN EntryMode => 

68 RS<="0"; RW<='0'; 

69 DB <= "00000110"; 

70 nx_state <= WriteDatal; 
71 WHEN WriteDatal => 

72 RS<='1"; RW<='0'; 

73 DB <= "01010110"; --'V' 
74 nx_state <= WriteData2; 
75 WHEN WriteData2 => 

76 RS<='1"; RW<='0'; 

77 DB <= "01001000"; --"'H' 
78 nx_state <= WriteData3; 
79 WHEN WriteData3 => 

80 RS<="1"; RW<='0'; 

81 DB <= "01000100"; --'D' 
82 nx_state <= WriteData4; 
83 WHEN WriteData4 => 

84 RS<="1"; RW<='0'; 

85 DB <= "01001100"; --'L' 
86 nx_state <= ReturnHome; 
87 WHEN ReturnHome => 

88 RS<="0"'; RW<='0'; 


89 DB <= "10000000"; 
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90 nx_state <= WriteDatal; 

91 END CASE; 

92 END PROCESS; 

93 END lcd_driver; 

A ee a ara St a a a tt atte Met eee eae 


Case 2 With “VHDL” blinking 


The new VHDL code is shown below. The only difference with respect to the previous solution is that 
it now contains a built-in timer in the lower section’s process, which acts as a secondary FSM, causing 
the system to operate as a normal FSM during 0.25s (lines 38 and 39), then stay in the state ClearDisplay 
during 0.25s (lines 40-43), hence resulting two blinks per second. The blinking frequency (2 Hz) can be 
easily modified by simply changing the value of the constant timer_limit specified in line 15. 


2 ENTITY lcd_driver IS 

3 GENERIC (clk_divider: INTEGER := 50000); --25MHz->500Hz 
4 PORT (clk, rst: IN BIT; 

5 RS, RW: OUT BIT; 

6 E: BUFFER BIT; 

7 DB: OUT BIT_VECTOR(7 DOWNTO 0)); 

8 END lcd_driver; 


9 SbStc cane rc oceans eben che ee eee OR SR Re Re eee Sees ee 
10 ARCHITECTURE lcd_driver OF lcd_driver IS 

11 TYPE state IS (FunctionSetl, FunctionSet2, FunctionSet3, FunctionSet4, 
12 ClearDisplay, DisplayControl, EntryMode, WriteDatal, WriteData2, 
13 WriteData3, WriteData4, ReturnHome); 

14 SIGNAL pr_state, nx_state: state; 

15 CONSTANT timer_limit: INTEGER := 250; --500Hz/250Hz=2Hz 

16 BEGIN 

if Sees Clock generator (E->500Hz): --------------- 

18 PROCESS (clk) 

19 VARIABLE count: INTEGER RANGE 0 TO clk_divider; 

20 BEGIN 

21 IF (clk"EVENT AND clk='1') THEN 

22 count := count + 1; 

23 IF (count=clk_divider) THEN 

24 E <= NOT E; 

25 count := 0; 

26 END IF; 

27 END IF; 

28 END PROCESS; 

2900 rrr Lower section of FSM: --------------------- 

30 PROCESS (E) 

31 VARIABLE timer: INTEGER RANGE 0 TO timer_limit; 

32 BEGIN 

33 IF (E'EVENT AND E='1') THEN 

34 IF (rst='1') THEN 

35 pr_state <= FunctionSetl1; 

36 ELSE 


37 timer := timer + 1; 
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38 IF (timer<timer_limit/2) THEN 
39 pr_state <= nx_state; 
40 ELSE 

4l pr_state <= ClearDisplay; 
42 IF (timer=timer_limit) THEN 
43 timer := 0; 

44 END IF; 

45 END IF; 

46 END IF; 

47 END IF; 

48 END PROCESS; 

49 0 --ee- Upper section of FSM: ----------------- 
50 PROCESS (pr_state) 

51 BEGIN 

52 CASE pr_state IS 

53 WHEN FunctionSetl => 

54 RS<="0'; RW<='0'; 

55 DB <= "00111000"; 

56 nx_state <= FunctionSet2; 
57 WHEN FunctionSet2 => 

58 RS<="0"; RW<='0'; 

59 DB <= "00111000"; 

60 nx_state <= FunctionSet3; 
61 WHEN FunctionSet3 => 

62 RS<="0"'; RW<='0'; 

63 DB <= "00111000"; 

64 nx_state <= FunctionSet4; 
65 WHEN FunctionSet4 => 

66 RS<="0"; RW<='0'; 

67 DB <= "00111000"; 

68 nx_state <= ClearDisplay; 
69 WHEN ClearDisplay => 

70 RS<="0"'; RW<='0'; 

71 DB <= "00000001"; 

2 nx_state <= DisplayControl; 
73 WHEN DisplayControl => 

74 RS<="0"; RW<='0'; 

75 DB <= "00001100"; 

76 nx_state <= EntryMode; 

77 WHEN EntryMode => 

78 RS<="0"'; RW<='0'; 

79 DB <= "00000110"; 

80 nx_state <= WriteDatal; 

81 WHEN WriteDatal => 

82 RS<="1"; RW<='0'; 

83 DB <= "01010110"; --'V' 

84 nx_state <= WriteData2; 

85 WHEN WriteData2 => 

86 RS<="1"; RW<='0'; 

87 DB <= "01001000"; --"'H' 


88 nx_state <= WriteData3; 
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89 WHEN WriteData3 => 

90 RS<="1"; RW<="0'; 

91 DB <= "01000100"; --'D' 
92 nx_state <= WriteData4; 
93 WHEN WriteData4 => 

94 RS<="1"'; RW<="0'; 

95 DB <= "01001100"; --'L' 
96 nx_state <= ReturnHome; 
97 WHEN ReturnHome => 

98 RS<="0'; RW<="0'; 

99 DB <= "10000000"; 

100 nx_state <= WriteDatal; 
101 END CASE; 

102 END PROCESS; 

103 END 1lcd_driver; 

NOG: SSSesSSSSesrsre esse e eS Sees eo see eee = ee eee 


Case 3 With “VH” in one line and “DL” in the other 


In this part, “VH” must be written in the first line and “DL” in the second. This means that the auto- 
matic address increment is not enough because after “VH” it will point to address 2. Recall that the LCD 
controller (HD44780U or KS0066U) employs 7 bits to address the LCD characters, hence with a total of 
128 addresses, distributed in two lines of 64 addresses each (0-63 in the first line, 64-127 in the second). 
This addressing scheme is independent from the actual LCD size (which can be as big as 64 x 2). In our 
case, the LCD size is 16 x2, but still the first character in the first line is at address 0, while the first in the 
second line is at address 64. To design our circuit, the only change needed in the solution of case 1 is the 
inclusion of an extra state (called SetAddress), that is: 


WHEN SetAddress => 
RS<='0"'; RW<='0'; 
DB <= "11000000"; 
nx_state <= WriteData3; 


This state must be located between the states WriteData2 and WriteData3. Notice that in this state 
DB="11000000", which then performs the instruction “Set DD RAM Address” depicted in Figure 23.13, 
setting the RAM address to 64. Obviously, the new state (SetAddress) must also be included in the enu- 
merated data type state of lines 11-13 in the solution for case 1. 


23.5 Exercises 


In all exercises below, the VHDL code must be written, compiled, debugged, and then carefully simulated. 
1. String detector 


Improve the VHDL code for the string detector seen in Section 23.1 by adding the capability to 
detect also capital letters (that is,"m" or "M,""p" or "P," and "3"). 


2. “Universal” signal generator 


Using VHDL and the FSM approach, design the signal generator described in Exercise 15.15. 


598 CHAPTER 23 VHDL Design of State Machines 


3. 


Switch debouncer 


A switch debouncer was designed in Section 22.2 using regular sequential code. Redo that 
design using the FSM approach (still using VHDL, of course). Which approach is simpler in 
this case? 


Two-window signal generator 


a. Using VHDL, but not the FSM approach, design a circuit for the two-window signal generator 
of Exercise 14.33. 


b. Redo the design using VHDL and the FSM approach. Which technique is simpler in this case? 
Programmable two-window signal generator 


Using VHDL and employing either regular sequential code or the FSM approach, design the pro- 
grammable two-window signal generator of Exercise 14.34. 


Car alarm #1 


a. Explain why the FSM shown in Figure 23.5(b) needs improvements to implement the car alarm 
described in Section 23.3. 


b. Using VHDL and the FSM approach, redesign the car alarm seen in case 1 of Section 23.3, but 
use the approach depicted in Figure 23.6(a) instead of that in Figure 23.6(b). 


Car alarm #2 


a. Explain why the FSM shown in Figure 23.9(a) needs improvements to implement the complete 
car alarm seen in case 3 of Section 23.3. 


b. Using VHDL and the FSM approach, redesign that car alarm (with debouncers and ON/ 
OFF chirps included), but use the approach depicted in Figure 23.9(b) instead of that in 
Figure 23.9(c). 


Garage-door controller 


Design the garage-door controller of Exercise 15.18. However, include in the circuit a 30s timer for 
automatic door closing. 


Door lock 
Design an FSM to control a door lock with the features below. 
a. The password should consist of three numeric digits from 0 to 9 (consider that it is 123); 


b. To simplify the design, assume that a numeric keypad with digits from 0 to 9, represented using 
the BCD code, is used to generate the input signal; 


c. Assume also that when no key is pressed the keypad outputs “1111” (=15); 
d. An LED should indicate the status of the door lock (ON when locked, OFF when unlocked). 
e. The maximum time interval between key punches should be 3 seconds (call it time1); 


f. After accepting the password, the lock must be unlocked and the LED turned OFF, remaining so 
for 5 seconds (call it time2); 


g. Assume that there is no limit on the number of password trials. 
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Note: Enter time1 and time2 using GENERIC. Their values should be the number of clock cycles to produce 
3- and 5-second delays, respectively. To make the simulations simpler to inspect, just consider time1 = 3 
and time2 = 5 when simulating the circuit. 


10. LCD driver 


a. Using your CPLD/FPGA development kit, plus a 16x2 LCD display, implement all three 
designs presented in Section 23.4 to get acquainted with this type of application. Do not forget 
to include the contrast control circuit shown in Figure 23.15. 


b. Modify the code for it to display again the word “VHDL,” but one digit at a time (from "V" to 
"L"), with a time separation between them of 0.5s. Two seconds after the word has been com- 
pleted, erase it and restart again, waiting 0.5s before entering the first digit. 


11. Signal generator with a quasi-single machine 


a. Redo the design of Section 23.2, this time using the quasi-single-machine approach described in 
Section 15.6. 


b. In this example, was the solution advantageous over that in Section 23.2? 
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Simulation with VHDL 
Testbenches 


Objective: As mentioned earlier, VHDL allows circuit synthesis as well as circuit simulation. In the 
preceding chapters we concentrated on the former, so all VHDL statements and constructs employed 
there are synthesizable. We turn now to circuit simulation, where fundamental simulation procedures 
are introduced and then illustrated by means of complete examples. A brief tutorial on ModelSim, a 
popular simulator for VHDL-based designs, which was employed to simulate all examples shown in 
this chapter, is included in Appendix A. 
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24.1 Synthesis versus Simulation 


VHDL is intended for circuit synthesis as well as circuit simulation. Synthesis is the process of translating 
a source code into a set of hardware structures that implement the functionalities described in the code. 
Circuit simulation, on the other hand, is a testing procedure used to ensure that the synthesized circuit 
does implement the intended behavior (normally performed before any physical implementation actu- 
ally takes place). 

The general simulation procedure is illustrated in Figure 24.1, which shows the design under test 
(DUT) in the center, the stimuli applied to the DUT on the left, and the corresponding DUT’s response 
on the right. Two VHDL files are mentioned, referred to as design file and test file. The former contains 
the code from which the circuit (DUT) is inferred, while the latter contains the testbench plus code for 
interfacing with the design file. As shown in the figure, the testbench is indeed composed of two parts: 
the first is for input stimulus generation, while the second (optional) is for output verification (that is, for 
comparing the actual outputs against expected templates). 

The design file implements the circuit (DUT), so its construction is that already described in the previous 
chapters. On the other hand, the test file is for simulations only, so its construction is described here. 
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Design file (DUT) 


Stimuli Resporise 
x4 
Yi ol Le a 
— —— X2 DUT 


i Test file (testbench) oe 


FIGURE 24.1. Circuit simulation with VHDL. Two files are needed, with the design file containing the DUT 
and the test file containing the testbench. 


Such a file may include the circuit’s internal propagation delays or not, and it may also include expected 
results or not. This kind of classification is described in the next section. 


24.2  Testbench Types 


The table below shows four types of simulation, which depend on whether the circuit's internal propaga- 
tion delays are included (timing analysis) or not (functional analysis), and also on whether the analysis 
of the results is conducted by visual inspection (manual analysis) or using VHDL (automated analysis). 
A brief description of each one follows. 


pes Circuit’s propagation delays Output waveform analysis 
I Not included (functional analysis) Visual inspection (manual analysis) 
I Included (timing analysis) Visual inspection (manual analysis) 
I Not included (functional analysis) With VHDL (automated analysis) 
IV Included. (timing analysis) With VHDL (automated analysis) 


m Type I testbench: In this case the DUT’s internal delays are not considered and the output is manu- 
ally verified (by visual inspection), being therefore the simplest kind of VHDL code for simulation. 
This testing procedure, also referred to as stimulus-only functional analysis, will be illustrated in 
Examples 24.2 and 24.3. 


m Type II testbench: In this case the DUT’s internal delays are taken into account, but the output is still 
manually verified (by visual inspection). This testing procedure, also referred to as stimulus-only 
timing analysis, will be illustrated in Examples 24.4 and 24.5. 


m Type [lI testbench: In this case the DUT’s internal delays are not considered but the output is automatically 
verified by the simulator. This testing procedure is also referred to as automated functional analysis. 


m Type IV testbench: The DUT’s internal delays are taken into account and the output is automatically 
verified by the simulator. This is the most complete and also the most complex type of VHDL code 
for simulation. Also referred to as full-bench or automated timing analysis, this testing procedure will 
be illustrated in Example 24.6. 
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A final comment regarding the VHDL statements needed to construct the testbenches and delayed 
DUTs. Only two statements are needed: WAIT and AFTER. However, because the former can do anything 
that the latter can, only WAIT is actually needed. 


24.3 Stimulus Generation 


Before we start working on actual simulations, it is necessary to learn how to construct the stimuli that 
will be included in the test file. 

Figure 24.2 shows five typical waveforms used in circuit simulations. In (a), a regular repetitive signal 
is depicted (with period 2T), typical of a clock. In (b), a single-pulse stimulus is shown, typical of reset. 
In (c), an irregular nonrepetitive signal appears. In (d), the signal is again irregular but repetitive. Finally, 
in (e), a multibit waveform is presented. 

We describe below how VHDL can be used to create each one of these waveforms (T=30ns will be 
assumed). Note in the codes that the only time-related statements are WAIT and AFTER, and that indeed 
any signal can be generated using only the former. 


Case 1 Generation of a regular repetitive waveform 
(Figure 24.2(a)) 


Option 1 (compact, but AFTER might not be supported): 


SIGNAL clk: BIT := 'l'; 


aS pages gg ES Gp I pg SE ig 


(a) Regular repetitive waveform oar 


(b) Single-pulse waveform 


(c) Irregular finite waveform 


(d) Irregular repetitive waveform 


ee ee oe.) oe 


(e) Multibit waveform 


FIGURE 24.2. Typical stimulus waveforms. 
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Option 2 (recommended): 


SIGNAL clk: BIT := 'l'; 


WAIT FOR 30 ns; 
clk <= NOT clk; 


SIGNAL clk: BIT := '1'; 


WAIT FOR 30 ns; 


clk, <= -"0"3 
WAIT FOR 30 ns; 
clk <= "1"; 


Case 2 Generation of a single-pulse waveform 
(Figure 24.2(b)) 


Note that the last WAIT statement in the code below is unbounded. 


SIGNAL rst: BIT := '0'; 


WAIT FOR 30 ns; 


rst. <= 7 1"3 
WAIT FOR 30 ns; 
rst <= '0'; 
WAIT; 


Case 3 Generation of an irregular 
nonrepetitive waveform (Figure 24.2(c)) 


Instead of repeating the WAIT statement several times, LOOP was employed in the code below with 


the waveform values assigned first to a CONSTANT (called wave). Note that the last WAIT is again 
unbounded. 


CONSTANT wave: BIT_VECTOR(1 TO 8) := "10110100"; 
SIGNAL x: BIT := '0'; 
FOR i IN wave'RANGE LOOP 
xX <= wave(i); 
WAIT FOR 30 ns; 
END LOOP; 
WAIT; 
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Case 4 Generation of an irregular 
repetitive waveform (Figure 24.2(d)) 


The only difference with respect to the case above is the removal of the last WAIT. 


CONSTANT wave: BIT_VECTOR(1 TO 8) := "10110100"; 
SIGNAL y: BIT := '0'; --initial value unnecessary 


FOR i IN wave'RANGE LOOP 
y <= wave(i); 
WAIT FOR 30 ns; 

END LOOP; 


Case 5 Generation of a multibit waveform 
(Figure 24.2(e)) 


In this case an integer was employed to generate an 8-bit waveform. Note that a signal can be declared 
without an initial value (which must then obviously be included in the code, as done below). The wave- 
forms generated from this code are repetitive. 


SIGNAL z: INTEGER RANGE 0 TO 255; 


z <= 0; 

WAIT FOR 120 ns; 
Zz <= 33; 

WAIT FOR 120 ns; 
Z <= 255; 

WAIT FOR 60 ns; 
z <= 99; 


WAIT FOR 180 ns; 


24.4 Testing the Stimuli 


This section describes a procedure for testing not the DUT, but the testbenches themselves. This is 
important in order to make sure that the signals being applied to the circuit are indeed the intended 
signals. 

A template for that purpose is shown in Figure 24.3. As can be seen, it contains the same three 
classical code sections seen in Section 19.2, that is, library declarations (if necessary), ENTITY, 
and ARCHITECTURE. However, note that ENTITY (lines 5 and 6) is empty, and that ARCHITECTURE 
(lines 8-27) contains only signal declarations (lines 9 and 10) and stimulus generation (lines 13-17 for 
a and 19-26 for b). Note also that only WAIT was employed in the time-related statements (no AFTER). 
This test file can be entered in the simulator without a DUT (design file), in which case the simula- 
tor will simply display the stimuli inferred from the corresponding testbench. The complete testing 
procedure is illustrated in the example below. 
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RF a es SP a 
Library _) 2 LIBRARY ieee; 
declarations | 3 USE ieee.std logic 1164.al1; 
0 Sk ee ee 
Entity | 9% ENTITY test_testbench IS 
(empty) | 6 END test _testbench; 
ec ar ee te er 
' 8 ARCHITECTURE testbench OF test_testbench IS : 
9 SIGNAL a: STD LOGIC := '1'; _ Signal 
10 SIGNAL b: STD_LOGIC := '0'; ~ declarations 
11 BEGIN = 
120 2 9------- generate a: ------- 
13 PROCESS 
14 BEGIN 
i WAIT FOR 30 ns; 
|} is a <= NOT a; 
Architecture < 1 bys END PROCESS; 
\ i —<=---= generate b: ------- Stimuli 
PROCESS > : 
20 See generation 
21 WAIT FOR 30 ns 
22 joVe se ae 
23 WAIT 30 ns; 
24 b <=" "0" 
25 WAIT; 
26 END PROCESS; 
~ 27 END testbench; 
P30 SSS SSS Sse SS SS ee Sa Sa 


FIGURE 24.3. Template for a test file aimed not at simulation but at testing the testbench itself. 


MM EXAMPLE 24.1 TESTING A TESTBENCH 


Write a test file containing a testbench that generates the waveforms of Figures 24.2(a)-(b), then test 
it without applying the waveforms to any circuit. 


SOLUTION 


The solution is divided into two parts: test file and simulation. 

Test file: The test file is shown in the template of Figure 24.3. 
Simulation: A brief tutorial on ModelSim, which is a simulator for VHDL- and Verilog-based designs, 
is included in Appendix A. Consequently, to simulate the file above, the reader is invited to go to that 
appendix and follow the steps described there (feel free to use any other simulator). In the tutorial, 
it is assumed that two files (design+ test) are available, which is the case in real simulations (seen 
ahead). However, because in the present example we simply want to test the fest file, the tutorial 
should be followed ignoring any reference to the design file. Set the simulation time to 500ns. The 
expected result is depicted in Figure 24.4. 
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BN %  Aest_mybench/a 1 a 
* _ hest_mybench/b 0 


FIGURE 24.4. Simulation results from the testbench of Example 24.1. Oo 


24.5  Testbench Template 


We introduce now a template for a test file used in real simulations. It contains a full testbench (whose 
output-verification part is optional) plus the code needed for that file to interface with a design file. The 
latter is assembled in the same way it was done in the previous chapters, with the only exception being 
that internal delay specifications are now accepted (they were not used before because delays are not 
synthesizable). The test file, on the other hand, is assembled differently because its only purpose is to 
create testbenches, having therefore nothing to do with the hardware. 

The template is shown in Figure 24.5. In this example, the test file is called test_mycircuit.vhd (see the 
entity’s name in line 5), while the design file to which it relates is called mycircuit.vhd (see the component's 
name in line 10). Note that the file again contains the same three classical code sections seen in Chapter 19 
(Figure 19.2), that is, library declarations (if necessary), ENTITY, and ARCHITECTURE. However, the entity 
(lines 5-6) is empty and the architecture (lines 8-38) contains only time-related statements (besides dec- 
larations and instantiations). 

In the declarative part of the architecture (before the word BEGIN), a COMPONENT must be declared 
(lines 10-13), which is a copy of the DUT’s entity (as seen in Section 19.13). Next, local signals, resem- 
bling those in the component, must also be declared (lines 15-17). These signals were named t_clk (test 
clock), t_x, and t_y, but they could have received the same names as the original signals (generally 
recommended to avoid confusion). Note that the initial value for t_y (line 17) was not specified because 
it is an output (initial values for inputs are optional). 

The first part of the code proper (after the word BEGIN in the architecture) contains a COMPONENT 
instantiation (lines 20 and 21), done exactly in the way seen in Section 19.13, with nominal mapping 
chosen in this case. The second part of the architecture (lines 22-31) contains the processes that generate 
the input waveforms; in this example, it was employed one process per signal, which is generally recom- 
mended unless the signals are highly related. The third (and final) part of the architecture (lines 32-37) 
is optional; it contains output verifiers, with the ASSERT statement used in this case. 

Complete simulation examples, employing the template above in both versions (stimulus-only and 
full), without and with circuit propagation delays, are presented in the sections that follow. 


24.6 Writing Type I Testbenches 


We present now complete simulation examples using VHDL and ModelSim. Because this section deals 
with Type I testbenches, the DUT’s internal propagation delays are not included (functional analysis) and 
the output results are examined by visual inspection (manual analysis). This type of test is also called 
stimulus-only functional analysis. 
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Library { 
declarations 
Entity 
(empty) 


Declarations 


Component 
instantiation 


Architecture 


Stimulus 
generation 


Output 
verification 
(optional) 


FIGURE 24.5. Template for a test file employed in actual simulations (contains a full testbench plus interface 


with a design file). 


MM EXAMPLE 24.2 TYPE | SIMULATION OF A CLOCK DIVIDER 


This example shows the functional simulation of a circuit that divides the clock frequency by 10. 
As shown in Figure 24.6, the circuit also contains an enable input, which causes the output to hold 
its value when unasserted. Output verification will not be included yet, so a stimulus-only testbench 


must be employed. 


SOLUTION 
The solution is divided into three parts: design file, test file, and simulation. 
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clk 
TUUUUUUUUL| clock divider adie 
era : a a 


FIGURE 24.6. Clock divider of Example 24.2. 


Design file: This is a very simple circuit (so we can concentrate on the simulation aspects) 
for which a design file is shown below. Because all signals are of type BIT, no additional library 
declarations are needed (so the code has only the ENTITY and ARCHITECTURE parts). As can be 
seen in the entity (lines 2-5), the inputs are called clk and ena, while the output is called output. 
The architecture (lines 7-25) contains a counter that counts from 0 to 4, flipping the value of temp 
every time the counter returns to zero, hence dividing the clock frequency by 10. The value of 
temp is eventually passed to output in line 22. Note that the counter only progresses if ena='1' 
(lines 15 and 16). 


1 ---- Design file (clock_divider.vhd): ------------ 
2 ENTITY clock_divider IS 

3 PORT (clk, ena: IN BIT; 

4 output: OUT BIT); 

5 END clock_divider; 

6 

7 


ARCHITECTURE clock_divider OF clock_divider IS 


8 CONSTANT max: INTEGER := 5; 

9 BEGIN 

10 PROCESS(clk) 

11 VARIABLE count: INTEGER RANGE 0 TO 7 := 0; 
12 VARIABLE temp: BIT; 

13 BEGIN 

14 IF (clk'EVENT AND clk='1') THEN 

15 IF (ena='1") THEN 

16 count := count + 1; 

17 IF (count=max) THEN 

18 temp := NOT temp; 

19 count := 0; 

20 END IF; 

21 END IF; 

22 output <= temp; 

23 END IF; 

24 END PROCESS; 

25 END clock_divider; 

26 (SH52 {spss te Sees TESS Pees ee ae SSeS eS Pees ee eae 


Test file: A test file for the clock_divider design is shown below (called test_clock_divider), 
which directly resembles the template of Figure 24.5 (without the optional output-verification 
section). Note the following: the entity (lines 2 and 3) is empty; the declarative part of the 
architecture contains a component declaration (lines 7-10) plus local signal declarations 
(lines 12-14); the code proper (lines 15-31) starts with a component instantiation (line 17) 
followed by one process to generate the clock waveform (lines 19-23) and another for enable 
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(lines 25-30). The first signal (clk) is similar to that in Figure 24.2(a), while the second (ena) is 
analogous to Figure 24.2(b). T=30ns was assumed in this example. Observe that again AFTER 
was not employed. 


==. Lest tile (test=clock divider sind) sss sss>sesS>scSecrSes 
ENTITY test_clock_divider IS 
END test_clock_divider; 


ARCHITECTURE test_clock_divider OF test_clock_divider IS 


---- component declaration: ------------------ 
COMPONENT clock_divider IS 

PORT (clk, ena: IN BIT; 

output: OUT BIT); 

END COMPONENT; 
---- signal declarations: -------------------- 
SIGNAL clk: BIT := '0'; 
SIGNAL ena: BIT := '0'; 
SIGNAL output: BIT; 


BEGIN 


Bis sists component instantiation: --------------- 
dut: clock_divider PORT MAP (clk=>clk, ena=>ena, output=>output) ; 
2s2e5 generate clock: ---------------50rr errr e 
PROCESS 
BEGIN 

WAIT FOR 30 ns; 

clk <= NOT clk; 
END PROCESS; 
225 generate enable: ------------------7---- 
PROCESS 
BEGIN 

WAIT FOR 60 ns; 

ena <= '1'; 

WAIT; --optional 
END PROCESS; 


31 END test_clock_divider; 


32 


Simulation: Simulation results, obtained with ModelSim (see tutorial in Appendix A), are shown 


in Figure 24.7. Just to practice with the simulator, after running the simulation the reader is invited to 
add a cursor to the plot, then click the output signal’s name (.../output) to highlight it, and last click 


| wave - default EWES 
/test_clock_divider/clk 


FIGURE 24.7. Simulation results from the clock divider of Example 24.2. 
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one of the Next Transition icons to force the cursor to snap to one of the output transitions. Check the 
time value at the cursor’s foot at every transition to certify that the transitions coincide with proper 
clock edges and that the output frequency is f,/10. 


EXAMPLE 24.3 TYPE | SIMULATION OF AN ADDER 


This example illustrates the generation of multibit waveforms (as in Figure 24.2(e)). For that, an 
adder is designed with unsigned inputs varying from 0 to 255. 


SOLUTION 


The solution is again divided into three parts: design file, test file, and simulation. 

Design file: This is another simple circuit, which suits perfectly the purpose of illustrating multibit 
signal generation. The design file is shown below, where a and b are 8-bit inputs and sum is the 
corresponding 9-bit output. 


i) ~exereete design file (adder.vhd): ------------- 
2 ENTITY adder IS 

3 PORT (a, b: IN INTEGER RANGE 0 TO 255; 
4 sum: OUT INTEGER RANGE 0 TO 511); 
5 END adder; 

6 Se Eris tee ae es ay arenes re a ae es a ay eae ener ae 
7 ARCHITECTURE adder OF adder IS 

8 BEGIN 

9 sum <= atb; 

10 END adder; 

Wh ete eta sae a ee ee Se ee 


Test file: The test file is shown next, again based on the template seen in Figure 24.5. It gener- 
ates two multibit stimuli (a and b), declared as integers (lines 9-10), with no initial values. The first 
process (lines 16-26) generates a with the following values: a=0 for 50ns, a=150 for 50ns more, 
a= 200 for 50ns again, and finally a=250 for other 50ns, after which the signal repeats itself. The sec- 
ond process (lines 28-36) generates b with the following values: b=0 for 75ns, b=120 for 75ns more, 
and finally b=240 for 50ns, after which b also repeats itself. 


i) eeehese-e test file (test_adder.vhd): ------------- 
2 ARCHITECTURE test_adder OF test_adder IS 

So Beers component declaration: ------ 

4 COMPONENT adder IS 

5 PORT (a, b: IN INTEGER RANGE 0 TO 255; 
6 sum: OUT INTEGER RANGE 0 TO 511); 
7 END COMPONENT; 

Se 9 estates signal declarations: ---------------- 
9 SIGNAL a: INTEGER RANGE O TO 255; 


10 SIGNAL b: INTEGER RANGE 0 TO 255; 

11 SIGNAL sum: INTEGER RANGE 0 TO 511; 

12 BEGIN 

13 2 Seals component instantiation: ------------ 
14 dut: adder PORT MAP (a=>a, b=>b, sum=>sum); 
WBy See Se SION) @y RS see Se ce = se rise Saas Sess S 


16 PROCESS 
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17 BEGIN 

18 a <= 0; 

19 WAIT FOR 50 ns; 
20 a <= 150; 

21 WAIT FOR 50 ns; 
22 a <= 200; 

23 WAIT FOR 50 ns; 
24 a <= 250; 

25 WAIT FOR 50 ns; 
26 END PROCESS; 

2 === Signal bY =ss8<ss=2eese2c senssmeese ee 
28 PROCESS 

29 BEGIN 

30 b <= 0; 

31 WAIT FOR 75 ns; 
32 b <= 120; 

33 WAIT FOR 75 ns; 
34 b <= 240; 

35 WAIT FOR 50 ns; 
36 END PROCESS; 

37 END test_adder; 

OOo Se See Ss ee see e ae ea te ee ee ee eae ae 


Simulation: The reader is invited to simulate this circuit using ModelSim and the tutorial present- 
ed in Appendix A. Set the simulation time to 280ns. The result should look like that in Figure 24.8. 
Just to practice with the simulator, place the mouse on the output signal’s name (.../sum), right-click 
it, and select Properties > Wave Color > Colors=Yel 1 ow; this will differentiate the output from 
the inputs. Click the output signal again, select Properties > Radix=Binary, and note that the 
representation of sum changes from default (unsigned decimal) to binary; next, return it to its original 
representation. 


FP wave - default 


FIGURE 24.8. Simulation results from the adder of Example 24.3. Oo 


24.7 Writing Type II Testbenches 


Contrary to the examples above, the examples shown in this section do take into account propagation 
delays inside the circuits (timing analysis instead of functional analysis). The output verification, however, 
is still not automated. This type of simulation is also called stimulus-only timing analysis. 


24.7 Writing Type II Testbenches 613 


MM EXAMPLE 24.4 TYPE II SIMULATION OF A CLOCK DIVIDER 


The simulation of Example 24.2 must be redone, now with the following DUT’s propagation delays 
included: 8ns to increment the counter, 5ns to store the output. 


SOLUTION 


The solution is again divided into three parts: design file, test file, and simulation. 

Design file: Internal delays must be included in the design file, so the design file of Example 24.2 was 
repeated below with some modifications. Because when WAIT is employed the process cannot have a 
sensitivity list, WAIT UNTIL (line 14) was employed instead of IF to detect clock events. The delay rela- 
tive to the counter is in line 16, while that relative to the output flip-flop is in line 23 (the latter could 
have been located between lines 18 and 19). With these delays, the output is only expected to receive a 
new value 8+5=13ns after the proper clock edge. Note that again AFTER was not employed. 


1 ---- Design file (clock_divider.vhd): ------------- 
2 ENTITY clock_divider IS 

3 PORT (clk, ena: IN BIT; 

4 output: OUT BIT); 

5 END clock_divider; 

6 

7 


ARCHITECTURE clock_divider OF clock_divider IS 


8 CONSTANT max: INTEGER := 5; 

9 BEGIN 

10 PROCESS 

11 VARIABLE count: INTEGER RANGE 0 TO 7 := 0; 
12 VARIABLE temp: BIT; 

13 BEGIN 

14 WAIT UNTIL (cl1kK"EVENT AND clk='1'); 

15 IF (ena='1') THEN 

16 WAIT FOR 8 ns; --counter delay=8ns 
17 count := count + 1; 

18 IF (count=max) THEN 

19 temp := NOT temp; 

20 count := 0; 

21 END IF; 

22 END IF; 

23 WAIT FOR 5 ns; --output delay=5ns 

24 output <= temp; 

25 END PROCESS; 

26 END clock_divider; 

2 Ree oe Pree Seats ae ee aes eS ee Se See eS oe eee 


Test file: See Example 24.2. 

Simulation: Simulation results (with ModelSim, Appendix A) are shown in Figure 24.9. It is 
important to confirm that now the output transitions do not coincide with clock transitions but are 
13ns delayed with respect to them. To do so, after running the simulation insert two cursors in the 
wave window, then click the output signal (.../output) to highlight it, and finally use the Next Transi- 
tion icons to have the cursors snap to two adjacent transitions (as in Figure 24.9). Observe at the foot 
of each cursor that, as expected, the transitions are 13ns after the proper clock edges and that the 
distance between the cursors is 300ns. 
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FIGURE 24.9. Simulation results from the clock divider with delay of Example 24.4. 


EXAMPLE 24.5 TYPE Il SIMULATION OF AN ADDER 


This example is similar to Example 24.3 but now with an internal delay included. It illustrates the 
introduction of delay in a combinational circuit. As we know from Sections 12.2 and 12.3, the actual 
delay in an adder depends on which bits have changed. Consequently, if a fixed value is used in the 
simulations, it should correspond to the worst-case scenario. The simulation of Example 24.3 must 
be repeated here with a fixed (worst-case) delay of 12ns included in the design file. 


SOLUTION 


The solution is again divided into three parts: design file, test file, and simulation. 

Design file: The design file is shown below. Comparing it to the file in Example 24.3, we observe 
that now a process is required because WAIT is a sequential statement. Recall that a process cannot 
have a sensitivity list when WAIT is employed. Note also the presence of WAIT UNTIL in line 11, 
which is fundamental to correctly establish a fixed event-based delay (12ns in this case). 


ces design file (adder.vha): ------------- 
ENTITY adder IS 

PORT (a, b: IN INTEGER RANGE 0 TO 255; 
sum: OUT INTEGER RANGE 0 TO 511); 


ARCHITECTURE adder OF adder IS 
BEGIN 
PROCESS 
10 BEGIN 
11 WAIT UNTIL a'EVENT OR b'EVENT; 
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12 WAIT FOR 12 ns; 

13 sum <= a+); 

14 END PROCESS; 

15 END adder; 

TG. See es ee en ep 


Test file: See Example 24.3. 

Simulation: Simulation results (with ModelSim, Appendix A) are shown in Figure 24.10. It is important 
to confirm that now the output transitions are 12ns delayed with respect to any input transition. To do so, 
after running the simulation use the Next Transition icons to have the cursor jump from one transition to 
another, always checking the time value at the cursor’s foot (as illustrated in Figure 24.10). 


2) | 
a 
G) 
El 


|Os8S) 2BA0C HFS) SHAH! + FWA OO yy] 
| RAT) |e Wi QQaux | F| 


i a 7 


Cursor 1 1300 ps 
D0 


aR r i 


FIGURE 24.10. Simulation results from the adder with delay of Example 24.5. C 


24.8 Writing Type III Testbenches 


In this case, the DUT’s internal delays are not considered, but the output is automatically verified by 
the simulator. This procedure, also called automated functional analysis, can be observed as part of the full 
testbench example shown in the next section. 


24.9 Writing Type IV Testbenches 


This is the complete (and more complex) simulation procedure. The DUT’s internal propagation delays 
are taken into account and the output is automatically verified by the simulator. It is also called full-bench 
or automated timing analysis. 
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MM EXAMPLE 24.6 TYPE IV SIMULATION OF A CLOCK DIVIDER 


We close the present series of examples with a timing simulation where a full testbench is employed, 
thus corresponding to the most encompassing verification option (i.e., design file with internal 
delays plus test file with full testbench). The clock divider of Examples 24.2 and 24.4 is employed 
again, now with output-verification features included in the testbench. This type of verification can 
be performed in several ways, like comparison between the output signal and a few reference values 
taken at particularly important points, or comparison between the (almost) complete output wave- 
form against the expected waveform, or by comparing two text files, one containing actual results 
and the other expected results. The case illustrated in this example is a direct comparison between 
the actual and the expected output waveforms. 


SOLUTION 


As in all previous examples, the solution is divided into three parts: design file, test file, and simulation. 

Design file: See Example 24.4 (internal delays of 8ns and 5ns are included). 

Test file: A full testbench is shown below. It is the same file used in Examples 24.2 and 24.4, 
but with a few modifications introduced. A signal called template was created, whose shape is that 
expected for output. This signal (template) was declared in line 15, then generated in the process of 
lines 32-39. It stays low (initial value) during 343ns (line 34), then switches between high and low, 
with 300ns in each state (lines 36 and 37). A permanent loop (based on ena='1', lines 35-38) was 
employed to generate it. This signal is then compared against the actual output in the next process 
(lines 41-48). Note that the comparison is made every 1 ns (line 43), but it can be more “tolerant” if 
that value is increased. To carry out the comparisons, the ASSERT statement is employed (lines 44— 
47), whose syntax is the following: 


ASSERT (boolean_expression) [REPORT "message"] [SEVERITY severity_level]; 


The severity level can be NOTE (to pass information from the simulator), WARNING (to inform 
that something unusual has occurred), ERROR (to inform that a serious unusual condition has 
been found), or FAILURE (a completely unacceptable condition). The message is written when the 
condition is false, with the last two (ERROR and FAILURE) causing the simulator to halt. Therefore, 
if the assertion in line 44 is not true, then the message of line 45 will be displayed and the simulator 
will halt. If line 46 is replaced with line 47, then the same message will be displayed, but just as a 
warning, and the simulator would continue execution. 


1 ---- Test file (test_clock_divider.vhd):---------------------- 
2 ENTITY test_clock_divider IS 


3 END test_clock_divider; 

4 Pah Ppa gest tas ai Se nce an ca aca mS Pye ccs ce cen SN ee a. ve, hs Si, cia eee ee 
5 ARCHITECTURE test_clock_divider OF test_clock_divider IS 
6 ---- component declaration: ------------- 

7 COMPONENT clock_divider IS 

8 PORT (clk, ena: IN BIT; 

9 output: OUT BIT); 

10 END COMPONENT; 

11 ---- signal declarations: --------------- 

12 SIGNAL clk: BIT := '0'; 

13 SIGNAL ena: BIT := '0'; 


14 SIGNAL output: BIT; 
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15 SIGNAL template: BIT := '0'; --for output verification 
16 BEGIN 

Li See Component instantiation: ---------- 

18 dut: clock_divider PORT MAP (clk=>clk, ena=>ena, output=>output); 
19, stests generate clock: ------------------>- 

20 PROCESS 

21 BEGIN 

22 WAIT FOR 30 ns; 

23 clk <= NOT clk; 

24 END PROCESS; 

250 ree generate enable: ------------------ 

26 PROCESS 

27 BEGIN 

28 WAIT FOR 60 ns; 

29 ena <= '1'; 

30 END PROCESS; 

31. e265 generate template: ---------------- 

32 PROCESS 

33 BEGIN 

34 WAIT FOR 343 ns; 

35 WHILE ena='1' LOOP 

36 template <= NOT template; 

37 WAIT FOR 300 ns; 

38 END LOOP; 

39 END PROCESS; 

40 = ----- Verify OULpUtS <--s-<sssseeeo8es ee 

41 PROCESS 

42 BEGIN 

43 WAIT FOR 1 ns; 

44 ASSERT (output=template) 

45 REPORT "Output differs from template!” 
46 SEVERITY FAILURE; 

47 --SEVERITY WARNING; 

48 END PROCESS; 

49 END test_clock_divider; 

bO eee Seas ere Se Stee se Se tee ee Sees Se eee eet aS ee ee 


Simulation: It is left to the reader to simulate the files above using ModelSim and the tutorial 
presented in Appendix A (Exercise 24.13). Set the simulation time to 1s. Include template in the 
wave window so it can also be visually compared to output. Run the simulation in the following four 
cases: 


a. With the test file as shown above; in this case, output=template, so no error messages are 
expected. 


b. With the time in line 37 changed to 301 ns; now an error (and halt) is expected at time = 644ns. 


c. With line 37 still with 301 ns but the “resolution” (time in line 43) changed to 5ns; errors can now 
only be detected at time values that are multiples of 5 (the first should occur at time =1245ns). 


d. Same as (b) but with severity_level=WARNING instead of FAILURE; several errors must be 
reported (at 644ns, 944ns, etc.) but only as warnings (no halt). 
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Final Notes: 


m@ The attribute s' LAST_EVENT can be useful in simulations; it returns the time elapsed since the last 
event occurred in the signal s. Example: 


ASSERT (s'LAST_EVENT=25 ns) 
m@ The variable NOW can also useful; it returns the current simulation time. Example: 


VARIABLE start: TIME; 
WAIT UNTIL enable='1'; 
start := NOW; 


m The use of data files to store/read simulation data to/from is often helpful, specially in large 
designs. For intance, ModelSim allows the creation of a data memory whose contents can be writ- 
ten to or loaded from a file. The use of such files is normally described in the tutorials that accom- 
pany the simulation software. 


24.10 Exercises 


In all exercises below, run proper simulations to verify the correctness of your solutions. 
1. Stimulus generator #1 
Write a VHDL code from which all waveforms of Figure E24.1 can be inferred. Adopt T=5ns and 
solve the exercise for the following cases: 
a. With repetitive signals (that is, with the signals of Figure E24.1 repeating themselves with 
period =87). 
b. With nonrepetitive signals (all values at '0' for time > 87). 


To test your solutions, use the procedure shown in Example 24.1. 


FIGURE E24.1. 


2. Stimulus generator #2 


Write a VHDL code from which the waveforms y and z shown in Figure E24.2 can be inferred. 
Consider x as a reference (given) signal and that T=15ns. To test your solution, use the procedure 
shown in Example 24.1. 
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FIGURE E24.2. 


3. Stimulus generator #3 


Write a VHDL code from which all waveforms of Figure E24.3 can be inferred. Note that only x, 
and x, are repetitive. Adopt T,=10ns and T,=25ns. To test your solution, use the procedure shown 
in Example 24.1. 


FIGURE E24.3. 


4. Stimulus generator #4 


Write a VHDL code from which the testbench of Figure E24.4 can be inferred. Note that the second 
waveform is a sequence of ASCII characters (Figure 2.12), so lowercase and capital letters are repre- 
sented differently. That sequence must repeat indefinitely (with period =7T, where T=20ns). To test 
your solution, again use the procedure shown in Example 24.1. 


je ee Be ee ee eS 


FIGURE E24.4. 


5. Type I simulation of a parity detector 


Create a stimulus-only testbench (as in Examples 24.2 and 24.3) for the parity detector of Example 
19.3. The code should generate the stimulus x shown in Figure 19.7. Use only the first six time 
intervals, each lasting 100ns (repetitivity is optional). Set the simulation time to 600ns. The design 
file is that in Example 19.3 with no internal delays (functional simulation); the GENERIC statement 
can be removed, with N=8 adopted in the design. 


6. Type II simulation of a parity detector 


In continuation to the exercise above, introduce a 5ns delay in the design file corresponding to the time 
needed for the circuit to compute y. Recall that a process is then necessary because WAIT is a sequential 
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10. 


11. 


12. 


13. 


statement, and consequently GENERATE must be replaced with LOOP. Because this is a combinational 
circuit (therefore asynchronous), observe how WAIT UNTIL was employed in the adder of Example 
24.5. Compile and simulate your code to check whether the correct delay occurs at the output. 


Type IV simulation of a parity detector 


Still in continuation to the exercise above, introduce now in the test file some type of information 
about expected values for y (it can be a waveform, like Example 24.6), then create a process that 
compares those values to the actual values of y using the ASSERT statement. Run the simulation 
under different circumstances (without and with error and with different severity levels), as done 
in Example 24.6. 


Type I simulation of a data sorter 


Create a stimulus-only testbench (as in Examples 24.2 and 24.3) for the data sorter of Example 19.9. 
The code should generate the stimuli a and b shown in Figure 19.15 during six time intervals of 
100ns each with the following values: a=50 (fixed) and b= (-25, 0, 25, 50, 75, 100). Set the simulation 
time to 600ns. The design file is that in Example 19.9 with no internal delays (functional simulation). 
Note that the compiler might require the data range in the package to be bounded. 


Type II simulation of a data sorter 


In continuation of the exercise above, introduce a 10ns delay in the design file corresponding to 
the time needed for the circuit to sort a and b and so produce x and y. Recall that a process is then 
necessary because WAIT is a sequential statement. Because this is a combinational circuit, observe 
how WAIT UNTIL was employed in the adder of Example 24.5. 


Type I simulation of a shift register 


Create a stimulus-only testbench (as in Examples 24.2 and 24.3) for the shift register with load 
capability seen in Section 22.1. The test file should generate the same (or approximately the same) 
stimuli used in the simulation with Quartus II (Figure 22.2). Set the simulation time to 1 ws. Com- 
pare your result for g against that in Figure 22.2. 


Type I simulation of a switch debouncer 


Create a stimulus-only testbench (as in Examples 24.2 and 24.3) for the switch debouncer seen in 
Section 22.2. The test file should generate the same (or approximately the same) stimuli used in the 
simulation with Quartus II (Figure 22.4). Set the simulation time to 1 1s. Compare your result for y 
against that in Figure 22.4. 


Type I simulation of a finite state machine 


In Section 23.1 an FSM capable of detecting the sequence of characters "mp3" was designed, with 
respective simulation results presented in Figure 23.2 (with Quartus II). Create now a stimulus-only 
testbench to perform a functional simulation for that circuit. The testbench must include the same 
sequence of signals employed in Figure 23.2, that is, a clock with period =40ns, a reset active only dur- 
ing the initial 40 ns, and finally the string of characters ('0', '3','m', 'p', 'a','m','m’, 'p','3', '3', '3', '0', '0'), 
each lasting 40 ns. Set the simulation time to 520ns. (Suggestion: Create a 13-character signal of type 
STRING to group all input characters and then use LOOP to run through it.) 


Type IV simulation of a clock divider 


Using the type IV simulation procedure, test the circuit seen in Example 24.6 (see simulating condi- 
tions at the end of that example). 


Simulation with SPICE 


Objective: This chapter describes the use of SPICE (Simulation Program with Integrated Circuit 
Emphasis) to model and simulate electronic circuits. The main types of analysis (DC, AC, transient, and 
Monte Carlo) are illustrated by several complete examples. This chapter is complemented with a tutorial 
on PSpice, a popular SPICE simulator, in Appendix B. 
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25.1 About SPICE 


SPICE (Simulation Program with Integrated Circuit Emphasis) is a general purpose simulator for elec- 
tronic circuits, originally developed at UC Berkeley in the 1970s. Even though intended mainly for analog 
and mixed (analog-digital) circuits, it is also appropriate for relatively small digital circuits, particularly 
in the characterization of standard cells. For very large digital systems, other simulation techniques, like 
VHDL testbenches (Chapter 24), are normally more adequate. 

There are several commercial versions of SPICE, of which HSpice (for workstations, now provided 
by Synopsys) and PSpice (for PCs, now from Cadence) are the most popular. Version 15.7 of the latter 
is described is Appendix B and was employed to simulate all circuits presented in this chapter. It is 
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important to mention, however, that the material presented in this chapter is as independent from the 
simulating platform as possible, so it can be easily adapted to any other SPICE simulator. 

Such simulators normally allow two types of entries: by means of schematics (graphical input) or by 
means of SPICE code (textual input). In the former the circuit is drawn and subsequently a schematics 
capture component of the simulation platform is invoked to extract the SPICE code from it. In the latter, 
the SPICE code is entered directly into the simulation environment. 

Coded inputs are recommended because they can be easily modified, partitioned, and reused. More- 
over, SPICE code is extremely simple to learn, leaving no reason for using the old-fashioned schematics- 
based inputs. Consequently, only coded inputs will be used in this chapter. Nevertheless, in Appendix B, 
which presents a tutorial on PSpice, the use of graphical input will also be described. 

As a final remark, when specifying a value in SPICE, the corresponding unit is optional, in which 
case the SI (International System of Units) is assumed (that is, distances are measured in meters, time 
in seconds, voltages in volts, currents in amperes, resistances in ohms, etc.). However, the use of units 
might help understand and debug the code. Additionally, SPICE simulators are generally not case sensi- 
tive, so multiples of quantities can be expressed as follows: 


Femto (10-'): F or f 

Pico (10-7): P or P 

Nano (107%): N orn 
Micro (10~°): U or u 
Milli (10-3): M orm 

Kilo (10°): K or k 

Mega (10°): MEG or meg 
Giga (10°): G or g 

Tera (10'%): T or t 


25.2 Types of Analysis 


SPICE allows several types of simulations, which are summarized below. The most common cases will 
be illustrated with several examples later. 

DC analysis (.DC and .0P commands): As seen in Section 1.11, the DC response of a circuit is its 
response to a large-amplitude slowly-varying stimulus. To obtain it, a DC voltage or current is applied 
to the circuit’s input and is swept between two limits with all resulting steady-state voltages and/or 
currents measured for each input value. The .0P response produces similar results but for a single 
operating point. 

Transient analysis (. TRAN command): As also seen in Section 1.11, this is the response of a circuit to 
a large-amplitude fast-varying stimulus (normally a square pulse) with propagation delays taken into 
account. This analysis, also called time response, is used mainly for testing the temporal behavior of digi- 
tal circuits. 

Fourier analysis (. FOUR command): This shows the spectral components of the transient response. 

AC analysis (.AC command): As seen in Section 1.11, this is the response of a circuit to a small- 
amplitude sinusoidal stimulus whose frequency is varied between two limits. The input-output 
relationship between voltages and/or currents is measured in magnitude and phase, so Bode dia- 
grams can be plotted. The main application of AC analysis is for testing the frequency response of 
analog circuits. 
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Noise analysis (. NOISE command): For each frequency of the AC analysis, the simulator measures 
the noise contributions at the output from all noise generators in the circuit. 

Monte Carlo analysis (.MC command): This is a statistical analysis in which the circuit parameters 
for which tolerances were specified are randomly varied and the resulting voltages and/or currents are 
measured repeatedly. This type of procedure can be incorporated to the DC, transient, and AC analyses 
mentioned above. 

Sensitivity/Worst-Case analysis (.WCASE command): In this case the circuit parameters for which 
tolerances were specified are varied one at a time in an attempt to obtain the worst-case scenario. Like 
Monte Carlo, this type of procedure can be incorporated to the DC, transient, and AC analyses. 

Of all analysis types above, DC and AC are by far the most commonly used for analog circuits, while 
DC and transient are the most frequently used for digital circuits. To all of these the Monte Carlo option 
can be added. 


25.3 Basic Structure of SPICE Code 


The purpose of this first example is simply to introduce the general architecture of SPICE code. Several 
other examples with additional details will be presented later, after we have described how electronic 
devices and voltage/current sources can be modeled and declared. 

The basic structure of SPICE code is depicted in Figure 25.1. In (a), a simple RC (resistive-capacitive) 
circuit is shown, while (b) presents a corresponding SPICE code. Note that all circuit nodes are numbered, 
with zero (0) reserved for GND. In this example, the input stimulus is a voltage called V,,,, applied to 
node 1 and GND, which causes the other node voltages and branch currents to also vary. 


R; R 


1.5kQ 
c Rs C2 Vout 


2.2kQ Eng 


(a) vin 


1 
2 
3 
aR deo ak 
Dik epee) alaOG 
On RS. 3 0 2g 2K > Circuit 
qe (Cl25 OFZ. 2u 
(b) 8 C230 lu 
9 Ge ce ete ce a a a a eS a Se a 
10 *DC analysis: Analysis 
11 Vin 1 0 ;independent DC source ” specifications 
12 .DC Vin OV 5V 0.01V 
eee meee ate Say eee eee Output 
ig [PROBE V(t) V(2)) Via) ERS) TKC2) | | oe ctions 
15 .END | and End 
16 We cae sa Gon oe oan heap ne ony pemreen one comm Si, Soar ame ony paar non Ge ons toon amp ong Sheet ewe aub-eng sie San amr A 


FIGURE 25.1. Basic structure of SPICE code. 
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Observe in the code of Figure 25.1(b) that an asterisk can be used to comment out a line, while a 
semi-colon can be used for comments anywhere. Lines 1, 3, 9, 13, and 16 were included to divide the 
code into several parts, making it simpler to describe. 

The code starts with a comment (always recommended) in line 2, followed by three sections. The first 
section (lines 4-8) contains the circuit. Note that resistor names always begin with R and capacitors with 
C. Note also that each declaration includes the device’s connecting nodes followed by the device’s value. 

The second code section (lines 10-12) contains analysis specifications. Line 11 defines the stimulus 
source (called Vj.) as being a voltage source (because it begins with V), which is connected between 
nodes 1 and 0. The . DC command in line 12 determines that this is a DC response, with V,,, varying from 
OV to 5V in steps of 10mV. 

The third code section (lines 14 and 15) contains output instructions plus the mandatory . END command 
to close the code. The . PROBE command in line 14 determines that the output must be an oscilloscope-like 
display, where the voltages on nodes 1, 2, and 3, as well as the currents through R, and C,, are plotted. 

Simulation results obtained with PSpice (described in Appendix B) are shown in Figure 25.2. In (a), 
the voltages are shown, while (b) shows the currents. Because the capacitors do not affect the DC (steady- 
state) response, the following relationships are expected (and can be observed in Figure 25.2): 


R,+R 3 

v@= Riki V(1)=0.79 V(1) (hence V(2) varies from 0 to 3.94 V) 
R : 

V(3)= ae V(1) =0.47 V(1) (hence V(3) varies from 0 to 2.34 V) 


—V(1)=5V 


<—V(2)=3.94V 


1U 5U 
o U(1) © U(3) © UC2) 


<— |(R3)=1.06mA 


<— |(C2)=0 
Bu 4uU 2u 3U AU 5uU 
o 1(€2) © I¢R3) Vin 


FIGURE 25.2. Simulation results obtained with the SPICE code of Figure 25.1(b): (a) Voltages on nodes 1, 2, 
and 3; (b) Currents through R; and C;. 
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va ‘ 
I(R3) = RR 0.21V(1)mA (hence I(R3) varies from 0 to 1.06mA) 


I(C,) =0 (in steady state no current flows through the capacitors) 


25.4 Declarations of Electronic Devices 


This section describes how resistors (R), capacitors (C), inductors (L), diodes (D), and MOS transis- 
tors (M) can be modeled and declared using the SPICE language. Several other kinds of electronic and 
electrical devices can be modeled as well, but only these five, which are among the most often used, are 
included in this brief tutorial. 


Resistor (R) 
Rxxx +node -node [model] value 


In the syntax above, Rxxx is the resistor’s name, which must start with R (xxx can be any name or 
number). +node and -node are the positive and negative nodes to which the resistor is connected (the plus 
and minus signs are important because they define how voltages on and currents through the resistor will 
be referenced). The resistor model is optional and will be described in the Monte Carlo analysis section. 


Examples: 

R1 1 5 100 ;100Q2 resistor connected between nodes 1 and 5 
R1 1 5 1000hm ;same as above 

Rmax 2 0 1MEG ;1MQ resistor connected between nodes 2 and 0 
Rmax 2 0 1lmeg ;Same as above 

Rmax 2 0 1lmegohm ;same as above 


Capacitor (C) 


Cxxx tnode -node [model] value [init_voltage] 


In the syntax above, (xxx is the capacitor’s name, which must start with C (xxx can be any name or 
number). +node and -node are the nodes to which the capacitor is connected, model is the capacitor 
model, and init_voltage is its initial voltage. The capacitor model is optional and will be described in 
the Monte Carlo analysis section. 


Examples: 
C3 15 22p 3V ;22pF capacitor connected between nodes 1 and 5 with Vinit=3V 
Cload 2 0 lu ;luF capacitor connected between nodes 2 and 0 and Vinit=0 


Inductor (L) 
Lxxx +node -node [model] value [init_current] 


In the syntax above, Lxxx is the inductor’s name, which must start with L (xxx can be any name or 
number). +node and -node are the nodes to which the inductor is connected, model is the inductor 


626 CHAPTER 25 Simulation with SPICE 


model, and init_current is its initial current. The inductor model is optional and will be described in 
the Monte Carlo analysis section. 


Examples: 

L2 15 3U 2mA ;3uH inductor connected between nodes 1 and 5 with Iinit=2mA 

Lx 2 3 0.5m ;0.5mH inductor connected between nodes 2 and 3 and Iinit=0 
Diode (D) 


Dxxx Anode Knode model [area] 


In the syntax above, Dxxx is the diode’s name (which must start with D), Anode and Knode are the 
anode and cathode nodes, respectively, model is the diode model (described below), and area is a 
multiplication factor that indicates how many diodes of the same type are connected in parallel (the 
model parameters affected by this factor are IS, RS, CJ0, and IBV). 


Examples: 


D1 5 7 1N4007 
Dx 5 2 my_model 


To enter a model, the .MODEL command must be used, which, for the case of a regular diode, is shown 
below (the model parameters are listed subsequently). 


.MODEL model_name D(parameterl=valuel parameter2=value2 ...) 

Parameter Description Default Unit 
IS Saturation current 1E-14 A 

N Emission coefficient 1 
RS Parasitic resistance 0 QO 
CJO Zero-bias junction capacitance 0 F 
Vd Junction potential 1 V 

M Junction grading coefficient ») 

FC Forward-bias depletion capac. 25 

TT coef. 0 

BV Transit time 00 

IBV Reverse breakdown voltage 1E-3 
EG Reverse breakdown current Led Ev 

XTI Bandgap energy 3 

XF Sat. current temperature exponent 0 

AF Flicker noise coefficient 1 
TNON Flicker noise exponent 27 °¢ 

Parameter measurement temperature 
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MOSFET (M) 


Mxxx Dnode Gnode Snode Bnode model [W] [L] 
+ [AD=value] [AS=value] [PD=value] [PS=value] 
+ [NRD=value] [NRS=value] [NRG=value] [NRB=value] 


In the syntax above, Mxxx is the MOSFET’s name (which must start with M). Note that the “+” sign 
can be used in SPICE to continue a statement in another line. The meanings of the parameters are listed 
below. 

Dnode = Drain node 

Gnode = Gate node 

Snode = Source node 

Bnode = Bulk (substrate or well) node 

W = Channel width 

L= Channel length 

AD = Area of drain 

AS = Area of source 

PD = Perimeter of drain 

PS = Perimeter of source 

NRx = Relative resistivity of x 


Example: In this example an nMOS transistor (M5) is connected between nodes 3 (drain), 2 (gate), 0 
(source), and 0 again (bulk). Its model is called N and its gate dimensions are W = 3 um and L = 2m. 


M5 3 200 N W=3U L=2U 


To enter a model, again the .MODEL command must be used, which, for the case of a MOS transistor, 
is the following (for n- and p-channel MOSFETs): 


.MODEL model_name NMOS (LEVEL=value parameterl=valuel parameter2=value2 ...) 
.MODEL model_name PMOS (LEVEL=value parameterl=valuel parameter2=value2 ...) 


A general model for MOS transistors was described in Section 9.3. In SPICE, several additional 
parameters are included to account for second-order effects, which are particularly notorious in 
deep-submicron devices. 

Several MOSFET models, among many others currently in use (including proprietary models), are 
listed below. Level 3 and BSIM2 are fine even for small transistors, but BSIM3v3 and BSIM4 are more 
accurate for very small devices (particularly under 100 nm). An approach based on compact models, which 
represents a departure from the conventional approaches, is also mentioned. To select a specific model, 
the LEVEL variable must be used. 


Level 1 

This is the Shichman-Hodges model, which consists of basic MOS equations (Equations 9.1-9.3 of 
Chapter 9) with some second-order effects (channel-length modulation and body effect) incorporated 
into them. 
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Level 2 


This is the Grove-Frohman model (also called MOS2 model), which improves the Level 1 model by 
adding other second-order effects, like mobility degradation and subthreshold current. 


Level 3 


This model (also called MOS3) is based on empiric equations. Its accuracy is similar to Level 2, but 
the simulations exhibit better convergence and are faster. Like the other two models above, this too is 
not appropriate for very small transistors. A discontinuity with respect to the KAPPA parameter was 
detected and fixed in SPICE 32 and later. 


BSIM3v3 


BSIM (Berkeley Short-channel IGFET model, where IGFET stands for insulated-gate field effect transis- 
tor, a synonym for MOSFET) is a set of MOSFET models developed by the BSIM group of UC Berkeley 
targeting particularly very small-channel transistors (especially sub-100 nm). Versions BSIM1 and BSIM2 
were rapidly improved to version 3.3 for smaller transistors, which became the industry standard. The 
first version of BSIM3v3 (3.3.0) was released in October 1995, and its most recent version (3.3.3) in July 
2005. Its main improvements with respect to previous versions include continuous I-V characteristics 
through all three operating regions (subthreshold, saturation, and linear—Figure 9.4), threshold volt- 
age sensitive to gate size, plus several other short-channel effects. This model is specified as Level 49 in 
HSpice and as Level 7 in PSpice. 


BSIM4 


This model includes current leakage, which is important for present 65 nm (and smaller) MOS transis- 
tors. It also includes a more accurate mobility prediction and new materials, like nonsilicon channel and 
nonpolysilicon gate. The first version of BSIM4 (4.0.0) was released in March 2000, and its most recent 
version (4.6.1) in May 2007. 


BSIM-SOI 


BSIM-SOI is a SPICE model for SOI (silicon-on-insulator) transistors (seen in Section 9.8). It is an exten- 
sion of BSIM3v3, with SOI substrate effects included. 


Compact MOS Model 


This is a promising approach for compact-modeling next-generation MOSFETs. Based on a unified 
inversion-charge and surface-potential model, it leads to compact and consistent expressions for circuit 
analysis and design. Details can be found in [Galup-Montoro07]. 


As an example of a MOSFET a model, BSIM3v3 parameters are shown below for a 0.13 4m CMOS 
process. Recall that LEVEL=49 must be employed instead of LEVEL=7 if running HSpice instead of 
PSpice. A large variety of model parameters can be found at the MOSIS site (www.mosis.org). 


*BSIMV3v3 parameters for 0.13um nMOS and pMOS transistors (version 3.1) 
*Level=7 for PSpice, Level=49 for HSpice 


-MODEL N NMOS (LEVEL=7 
+VERSION =3.1 TNOM =27 TOX =1.42E-8 
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+XJ =1.5E-7 

+K1 =0.8976376 
+K3B =-8.2369696 
+DVTOW =0 

+DVTO =2.7123969 
+U0 =451.2322004 
+UC =1.22401E-11 
+AGS =0.130484 
+KETA =-3.043349E-3 
+RDSW =1.367055E3 
+WR =1 

+XL =1E-7 

+DWB =3.676235E-8 
+CIT =0 

+CDSCB =0 

+DSUB =0.0764123 
+PDIBLC2 =2.366707E-3 
+PSCBE1 =6.611774E8 
+DELTA =0.01 

+PRT =0 

+KT1L =0 

+UB1 =-7.61E-18 
+WL =0 

+WWN =1 

+LLN =] 

+LWL =0 

+CGDO =2.32E-10 
+CJ =4.282017E-4 
+CJSW =3.034055E-10 
+CJUSWG +=1.64E-10 
+CF =0 

+PK2 =-0.0289036 


-MODEL PPMOS (LEVEL=7 
+VERSION =3.1 


+XJ =1.5E-7 

+K1 =0.5464347 
+K3B =-0.8373484 
+DVTOW =0 

+DVTO =2.0973823 
+U0 =220.5922586 
+UC =-6.19354E-11 
+AGS =0.1447245 
+KETA =-1.093365E-3 
+RDSW =3E3 

+WR =1 

+XL =1E-7 

+DWB =1.706031E-8 
+CIT =0 

+CDSCB =0 


+DSUB =1 


NCH =1.7E17 
K2 =-0.09255 
WO =1.041146E-8 
DVTIW =0 
DVT1 =0.4232931 
UA =3.091785E-13 
VSAT =1.715884E5 
BO =2.446405E-6 
Al =8.18159E-7 
PRWG =0.0328586 
WINT =2.443677E-7 
XW = 
VOFF =-1.493503E-4 
CDSC =2.4E-4 
ETAO =2.342963E-3 
PCLM =2.5941582 
PDIBLCB =-0.0431505 
PSCBE2 =3.238266E-4 
RSH =83.5 
UTE ==1..5 
KT2 =0.022 
UC1 ==5.6E=11 
WLN = 
WWL = 
LW = 
CAPMOD = 
CGSO =2.32E-10 
PB =0.9317787 
PBSW =0.8 
PBSWG =0.8 
PVTHO =0.0520855 
WKETA =-0.0237483 
TNOM 27 
NCH =1.7EL7 
K2 =8.119291E-3 
WO =1.30945E-8 
DVT1W 0 
DVT1 0.5356454 
UA =3.144939E-9 
VSAT =1.176415E5 
BO =1.149181E-6 
Al =3.467482E-4 
PRWG =-0.0418549 
WINT =3.007497E-7 
XW =0 
VOFF =-0.0801591 
CDSC =2.4E-4 
ETAO =0.4060383 
PCLM =2.2703293 
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VTHO =0.629035 


K3 =24.0984767 
NLX =1E-9 

DVT2W =O 

DVT2 =-0.1403765 
UB =1.702517E-18 
AO =0.6580918 
Bl =5E=6 

A2 =0.3363058 
PRWB =0.0104806 
LINT =6.999776E-8 
DWG =-1.256454E-8 
NFACTOR =1.0354201 
CDSCD =0 

ETAB =-1.5324E-4 
PDIBLC1 =0.8187825 
DROUT =0.9919348 
PVAG =0 

MOBMOD =1 

KT1 ==0.11 

UAL =4.31E-9 

AT =3.3E4 

WW = 

LL 7 

LWN = 

XPART =0.5 

CGBO =1E-9 

Md =0.4495867 
MJSW =0.1713852 
MJSWG = =0.1713852 
PRDSW =112.8875816 
LKETA =1.728324E-3) 
TOX =1.42E-8 
VTHO =-0.9232867 
K3 =5.1623206 
NLX =5.772187E-8 
DVT2W =0 

DVT2 =-0.1185455 
UB =1E-21 

AO =0.8441929 
Bl =5E=6 

A2 =0.4667486 
PRWB =-0.0212201 
LINT =1.040439E-7 
DWG =-2.133809E-8 
NFACTOR =0.9468597 
CDSCD =0 

ETAB =-0.0633609 
PDIBLC1 =0.0279014 
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+PDIBLC2 =3.201161E-3 PDIBLCB =-0.057478 DROUT =0.1718548 
+PSCBE1 =4.876974E9 PSCBE2 =5E-10 PVAG =0 

+DELTA =0.01 RSH =105.3 MOBMOD =1 

+PRT =0 UTE SS KT1 =-0.11 
+KTIL =0 KT2 =0.022 UA1 =4.31E-9 
+UB1 =-7.61E-18 UC1 ==5.6E=11 AT =3.3E4 

+WL =0 WLN = WW =0 

+WWN =1 WWL = LL =0 

+LLN =1 LW = LWN 1 

+LWL =0 CAPMOD = XPART =0.5 

+CGDO =3.12E=10 CGSO =3..12E=10 CGBO =1E=9 

+CJ =7.254264E-4 PB =0.9682229 MJ =0.4969013 
+CJUSW =2.496599E-10 PBSW =0.99 MJSW =0.386204 
+CUSWG =6.4E-11 PBSWG =0.99 MJSWG =0.386204 
+CF =0 PVTHO =5.98016E-3 PRDSW =14.8598424 
+PK2 =3.73981E-3 WKETA =7.286716E-4 LKETA =-4.768569E-3) 


25.5 Declarations of Independent DC Sources 


This section describes how independent DC sources of voltage (V) or current (I) can be modeled and 
declared using the SPICE language. The following three cases will be presented: 


m@ Independent DC voltage source (V) 
m@ Independent DC current source (I) 


m Independent source with DC sweep 


Independent DC Voltage Source (V) 
Vxxx +node -node [DC] value 


In the syntax above, Vxxx is the voltage source’s name (which must start with V), +node and -node 
are its positive and negative terminals, respectively, DC is an optional statement to emphasize that it is a 
DC source, and value is its voltage. 


Examples: 


Vin 1 0 DC 5V 
Vin 105 ;same as above 


Independent DC Current Source (I) 
Ixxx +node -node [DC] value 


In the syntax above, 1 xxx is the current source’s name (which must start with I), +node and -node are 
its positive and negative terminals (the current comes out of the “—” side), DC is an optional statement to 
emphasize that it is a DC source, and value is the current. 
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Examples: 


Iref 3 0 DC 1mA 
Iref 3 0 1m ;same as above 


Independent Source with DC Sweep 


Vxxx +node -node [DC] [OP] 
-DC CLIN] Vxxx V1 V2 increment 


This type of source is needed for DC analysis. The first line in the syntax above shows the source’s name 
(which must start with V or I), its connecting nodes, the optional word DC to emphasize that it is a DC 
source, and an optional operating point (needed when the . 0P command is used). The second line contains 
the .DC command to cause a DC sweep, followed by the optional word LIN, then the source’s name, the 
initial (V1) and final (V2) voltages (or currents), and finally the voltage (current) increment. LIN (default) 
indicates that the sweep is linear; the other options are octave and decade, rarely used in DC analysis. 

Example: In this example the source, called V;,,, is a voltage source connected between nodes 1 and 0, 
and varies from 0 V to 3V in increments of 5mV. As before, units are optional (though recommended). 


Vin 1 0 DC 
.DC Vin OV 3V 5mV 


25.6 Declarations of Independent AC Sources 


This section describes how independent AC sources of voltage (V) or current (I) can be modeled and 
declared using the SPICE language. The following six cases will be presented: 


m Independent pulsed AC source 
Independent piecewise linear AC source 
Independent sinusoidal AC source 


13] 

a 

m@ Independent exponential AC source 

m Independent frequency-modulated AC source 
iz] 


Independent AC source with sinusoidal sweep 


Independent pulsed AC source 


Vxxx +node -node PULSE(V1 V2 TD TR TF PW PER) 
Ixxx +node -node PULSE(I1 I2 TD TR TF PW PER) 


This type of signal is depicted in Figure 25.3, where V1-V2 (11-12) are the pulse voltages (currents), 
TD is the time delay, TR is the rise time, TF is the fall time, PW is the pulse width, and PER is the period. As 
before, the use of units is optional, but it makes the declarations easier to understand and debug. 


Examples: 

Vin 1 0 PULSE(OV 5V 5US 1US 1US 5US 15US) 

Vin 1 0 PULSE(OV 5V 5us lus lus 5us 15us) ;same as above 
Vin 1 0 PULSE(O 5 5U 1U 1U 5U 15U) ;same as above 
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Itest 3 4 PULSE(2UA 2.5uA 20ns Ons Ons 50ns 75ns) 
Itest 3 4 PULSE(2U 2.5U 20N ON ON 50N 75N) ;same as above 


Independent piecewise linear AC source 


Vxxx +node -node PWL(T1 V1 T2 V2 T3 V3...) 
Ixxx +node -node PWL(T1 I1 T2 12 T3 13...) 


This type of signal is depicted in Figure 25.4. In the syntaxes above, V1 (11) is the voltage (current) 
at time T1, V2 (12) is the voltage (current) at time T2, and so on. As always, the use of units is optional 
(but recommended). 

Examples: 


Vin 1 0 PWL(Ous OV lus 5V lus OV 2us 5V 2us OV) 
Vin 1 0 PWL(OU 0 1U 5 1U 0 2U 5 2U 0) ;same as above 


Independent sinusoidal AC source 
Vxxx +node -node SIN(offset amp] freq [delay] [damping] [phase]) 
Ixxx +node -node SIN(offset amp] freq [delay] [damping] [phase]) 


This declaration is for a fixed-frequency sinusoid. For variable frequency (AC sweep), see the last type 
of declaration in this section. 


Vi= = 
V2= 12= 
TD= V pulse TD= | pulse 
TR= TR= 

Bm isl = ; nar LES TES 

oF jake on PW= | PW= | 

, ™ {TR PW TF PERS pera > 5 

PER : 


(a) (b) 


FIGURE 25.3. (a) AC signal of type PULSE; (b) Corresponding schematics symbols. 


T2, V2 T4, V4 
C] 
Vpwl 
T1.V1 T3, V3 mM ye O 
(a) (b) 


FIGURE 25.4. (a) AC signal of type PWL; (b) Corresponding schematics symbols. 
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g 
Vsin 
VOFF= 
VAMPL= 
FREQ= ] 
C) 
C 
Isin 
|OFF= 
IAMPL= 
FREQ= 
a 
| 
(c) 


me uct) (b) 


FIGURE 25.5. (a)-(b) AC signals of type SIN; (c) Corresponding schematics symbols. 


In the syntaxes above, offset is the voltage (current) offset, ampl is the signal amplitude, delay is the 
time interval until the sinusoid begins, damping is the damping factor, «, which defines a compressing 
exponential e™“’, and phase is the signal’s initial phase. Again, the use of units is optional. 

This type of signal is illustrated in Figure 25.5, which shows in (a) and (b) the signals specified below 
and in (c) corresponding schematics symbols. 


Vin 1 0 SINCOV 3V 1KHz)~ ;figure 25.5(a) 
Vin 1 0 SINCOV 3V 1KHz 0.5US 1000 ODEG)  ;figure 25.5(b) 
Vin 1 0 SIN(O 3 1K 0.5U 1000 0) ;same as above 


Independent exponential AC source 


Vxxx +node -node EXP(V1 V2 TD1 TCl TD2 TC2) 
Ixxx +node -node EXP(I1 12 TD1 TC1 TD2 TC2) 


In the syntaxes above, V1-V2 (11-12) are the regime voltages (currents), TD1-TD2 are the time delays, 
and TC1-TC2 are the time constants. 

This type of signal is illustrated in Figure 25.6, which shows in (a) the signal specified below and in 
(b) corresponding symbols employed in schematics. 


V1 10 EXP(0.5V 5V lms 0.5ms 5ms 1lms)_ ;figure 25.6 
V1 1 0 EXP(0.5 5 1m 0.5m 5m 1m) ;same as above 
V1 1 0 EXP(0.5 5 1M 0.5M 5M 1M) ;same as above 


Independent frequency-modulated AC source 


Vxxx +node -node SFFM(offset c_ampl c_freq mod_index s_freq) 
Ixxx +node -node SFFM(offset c_ampl c_freq mod_index s_freq) 
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(a) (b) 


FIGURE 25.6. (a) AC signal of type EXP; (b) Corresponding schematics symbols. 


This type of signal is called SSFM (single-frequency frequency modulated). In the syntaxes above, 
offset is the voltage (current) offset, c_amp1 is the carrier amplitude, c_freq is the carrier frequency, 
mod_index is the modulation index, and s_freq is the modulating signal’s frequency. 


Example: 
V1 1 0 SSFMC(OV 2V 1MHz 5 10kHz) 


Independent AC source with sinusoidal sweep 
Vxxx +node -node [DC] offset [AC] amp] 
-AC DEC points/dec init_freq final_freq 
-AC LIN total_points init_freq final_freq 


This type of source is needed for frequency-response (AC) analysis. The first line in the syntax above 
shows the signal’s name (which must start with V or I), its connecting nodes, the optional word DC, the 
DC offset, the optional word AC, and finally the amplitude of the sinusoid. The second and third lines 
(only one can be used at a time) show the .AC command that causes the frequency sweep, followed 
by the options DEC (decade) or LIN (linear), the number of points per decade or total (depending on 
the option chosen), then the initial and final sweep frequencies. It is important to mention that even if 
a linear sweep is chosen, the results can still be plotted using a logarithmic frequency axis (for Bode 
diagrams). 

Example: In this example the source, called V7,,, is a voltage source, connected between nodes 1 and 0, 
with 0 V offset and 1.5 V amplitude. The response is measured in decades (logarithmically), with 100 
points per decade, over the frequency range from 10 Hz to 10 kHz. As before, units are optional. 


Vin 1 0 DC OV AC 1.5V 
AC DEC 100 10Hz 10kHz 


Example: This time the points (frequencies) for which the output is measured are linearly distributed, 
with a total of 1000 points (so measurements are taken at ~10 Hz from each other), over the same 10 Hz— 
10 kHz frequency range. 


Vin 1 0 DC OV AC 1.5V 
-AC LIN 1000 10Hz 10kHz 
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25.7 Declarations of Dependent Sources 


This section describes how dependent voltage (V) or current (I) sources can be declared using the SPICE 
language. Figure 25.7 will be employed in the example. 


Voltage-controlled voltage source (E) 
Exxx +node -node +control_node -control_node multiplier 


Current-controlled current source (F) 
Fxxx +node -node OV_source multiplier 


Voltage-controlled current source (G) 
Gxxx +node -node +control_node -control_node multiplier 


Current-controlled voltage source (H) 
Hxxx +node -node OV_source multiplier 


To create a current-controlled source (Fxxx or Hxxx), a dummy OV voltage source is needed (called 
OV_source in the syntax above). The current through it is the control current. This procedure is illus- 
trated in the example below. 

Example: Suppose that we want to simulate the circuit of Figure 25.7(a), where the output current is 
controlled by the input current (I, =1001,). In Figure 25.7(b) a dummy source (Vy) was introduced, so the 
SPICE code can now be, for example, as follows (a DC analysis is shown, with the input varying from 
OV to 50mV in steps of 1 mV; consequently, the output current varies from 0 to 5mA, causing a voltage 
drop on R, from 0V to 10V). 


V1 10 0C 

R1 1 3 1k 

R2 2 0 2k 

*Dummy source (VX): 

VX 3 0 DC OV 
*Current-controlled current source (10011): 
Fl 0 2 Vx 100 

*DC analysis: 

.DC V1 OV 50mV ImV 

-PROBE V(1) V(2) I(R1) I(R2) 
. END 


(a) 


FIGURE 25.7. Inclusion of a dummy voltage source to implement a current-dependent current source. 
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25.8 SPICE Inputs and Outputs 


This section describes how a circuit can be introduced into a SPICE simulator and the options in which 
the results can be subsequently displayed. 


SPICE inputs 


As mentioned in Section 25.1, SPICE simulators normally allow two types of entries: by means of 
schematics (graphical input) or by means of SPICE code (textual input). In the former the circuit is 
drawn and subsequently a schematics capture component of the simulation platform is invoked to 
extract the SPICE code from it. In the latter, the code is entered directly into the simulation environ- 
ment. Because coded inputs can be easily modified, partitioned, and reused, they constitute the recom- 
mended alternative. 

A file containing SPICE code is normally saved with the extension .cir. It must contain 
(i) the circuit description, (ii) the models for all special components (transistors, diodes, ICs, 
etc.), (iii) analysis specifications, and finally (iv) output specifications (most of these parts were 
already seen in Figure 25.1). 

With respect to the device models, these are often left in a separate file (also saved with the extension 
.cir) to keep the code as short and as clean as possible. Such a file must then be called in the main code 
using the command . INC (include). An example is shown below, which contains two files, the first called 
mosfet_models.cir, the second inverter.cir. Note that the former is included in the latter using the . INC 
command. Observe also that the .OPTION command is used in the latter to establish default values for 
transistor dimensions. 


xFile "mosfet_models.cir" 
*MOSFET level=3 NMOS and PMOS models 


-MODEL N3 NMOS (LEVEL=3 


+ TOX = 3.08E-8 NSUB = 1.216456E15 GAMMA = 0.6867485 

+ PHI = 0.5 VTO = 0.6076967 DELTA = 0.9370422 

+ U0 = 542.2148623 ETA = 1.057066E-3 THETA = 0.0709743 
+ KP = 7.253862E-5 VMAX = 2.53895E5 KAPPA = 1 

+ RSH = 0.0487828 NFS = 4.912536E11 TPG = 1 

+ XJ = 3E-7 LD = 1.015042E-9 WD = 6.618607E-7 
+ CGDO = 1.75E-10 CGSO = 1.75E-10 CGBO = 1E-10 

+ CJ = 2.924107E-4 PB. = 0.8547965 MJ = 0.5 

+ CUSW = 1.415502E-10 MISW = 0.0855895 ) 

Aaaoiarecwe aa Sac eee nee eee cee eee eas See eve mene Se eee Sl eee eee eee eee See Ge eee eae ee eve ace, 
.MODEL P3 PMOS (LEVEL=3 

+ TOX = 3.08E-8 NSUB = 1£17 GAMMA = 0.5020503 

+ PHI = 0.7 VTO = -0.9340874 DELTA = 0.3060342 

+ UO = 103.4547789 ETA = 7.312222E-5 THETA = 0.1274607 

+ KP = 2.402976E-5 VMAX = 3.572101E5 KAPPA = 150 

+ RSH = 37.9838722 NFS = 6.143406E11 TPG = -l 

+ XJ = 2E-7 LD = 1.005237E-14 WD = 9.524549E-7 
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+ CGDO =2.09E-10 C@GSO =2.09E-10 CGBO =1E-10 

+ CJ =3.094774E-4 PB =0.8 MJ =0.446774 

+ CUSW =1.71185E-10 MJSW =0.0917237 ) 

ies ayaine, ae ee Ss ade ayaa! ae ee eee ees eee Se Ge ee See epee aos ee are Sue Stee ene et ees aie a Sia eee aes apeue; ee ee 
Keeiees cies Sena e ee a. Sieiecerensiee Gu. e es Sete eee Se Sees 


*File “inverter.cir" 
*SPICE code for a CMOS inverter 


. INC mosfet_models.cir 

-OPTIONS DEFW=3U DEFL=2U DEFAD=36P DEFAS=36P 
M1 3200 N3 

M2 3 2 11 P3 W=6U 

CL 3 0 0.1PF 

VDD 10 DC 3.3V 


*Transient analysis: 
Vin 2 0 PULSE (OV 3.3V 40N 0.1N 0.1N 40N 80N) 
. TRAN 0.1N 160N 


. END 


SPICE outputs 


Suppose that the input file is called inverter.cir. Then PSpice (Appendix B) generates an output 
text file whose default name is inverter.out, which is particularly helpful when errors occur. Such 
a text file can contain two types of simulation results, determined by the commands .PRINT 
and .PLOT. The first consists of a numeric table, while the second is a very simple plot using 
ASCII characters. This file also contains a table with the results when Monte Carlo analysis is 
performed. 

Besides the inverter.out file, PSpice can also create an inverter.dat file, which allows the simulation 
results to be viewed in an oscilloscope-like screen created by the . PROBE command. This option is much 
more sophisticated than the other two and makes the inspection of results much easier. 

In summary, three types of simulation outputs are available in PSpice, determined by the .PRINT, 
.PLOT, and . PROBE commands, whose syntaxes are shown below. 


.PRINT simulation_type signall signal2 ... 

Prints the simulation results in the form of a table in the output text file inverter.out. 
.PLOT simulation_type signall signal2 ... 

Plots the simulation results in a rudimentary graph in the output text file inverter.out. 
.PROBE signall signal2 ... 


Displays the simulation results in an oscilloscope-like screen created using the inverter.dat file. 
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Example: Suppose that we want to print, plot, and probe the DC voltages at nodes 2, 3, and 6, and the 
DC currents through resistor R, and at the drain terminal of a MOSFET called Mx. Then the following 
should be included in the corresponding SPICE code: 


«PRINT DC V(2) V(3) V(6) I(R1) ID(Mx) 
-PLOT DC V(2) V(3) V(6) I(R1) ID(Mx) 
-PROBE V(2) V(3) V(6) I(R1) ID(Mx) 


25.9 DC Response Examples 


We present in this section complete SPICE simulations illustrating the DC response. To make the analysis 
more productive, circuits for which the actual results are well known are employed, so the simulation 
results can be easily compared against predictions. 


MM EXAMPLE 25.1 DC RESPONSE OF A DIODE-RESISTOR CIRCUIT 


Figure 25.8(a) shows a very simple diode-resistor circuit. Determine the DC current through the 
diode for V;,, in the 0 V to 2 V range and the following values of Ry: 5000, 1kQ, and 1.5kQ. 


(1) 


(2) 


D1 
\V iN4007 
(a) 


3maA 


1ma 


(b) 6a — 
6U 6.5U 1.6U 1.5U 2.6U 
o « + (D1) Vin 


FIGURE 25.8. (a) Circuit of Example 25.1; (b) Respective simulation results. 


SOLUTION 


This example illustrates the use of the .DC and . PARAM commands. Because three curves must be deter- 
mined (one for each value of Rj), the experiment could be run three times; however, in this type of situation 
it is normally more productive to run them all together because result comparisons are then simpler. 
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ASPICE code for this circuit is shown below. It starts with a comment (line 2), followed by five sec- 
tions. The first section (lines 4—7) contains the diode mode (available in your simulator’s library). The 
second section (lines 9 and 10) contains the circuit. Note that because the experiment must be run for 
several values of R,, a parameter called value was employed to represent R,’s actual value. The next 
section (lines 12 and 13) shows a parametric analysis, where the command . PARAM is used to inform 
that value is a parameter (its initial value is needed but is not important), which must receive the val- 
ues listed in line 13 using the . STEP command and the keywords PARAM and LIST. The fourth section 
shows the DC analysis, which says that a voltage source called V;,,, connected between nodes 1 and 0, 
must vary from 0 V to 2 V in steps of 10 mV. Finally, the fifth section contains output specifications (the 
oscilloscope-like output was chosen using the . PROBE command) plus the final . END command. 


Sc St a ret a mea a eS Ny ate amt a aR ct teeta eh 
*SPICE code for the circuit of Example 25.1. 
Rees DIOde MOGI: Ss Sese Ses Se oe ee eS ee eS 


-MODEL DIN4007 D(Is=14.11n N=1.984 Rs=33.89m 

+ Ikf=94.81 Xti=3 Eg=1.11 

+ Cjo=25.89p M=0.44 Vj=0.3245 

+ Fco=0.5 Ibv=1l0u Tt=5.7u Bv=1500) 
Roce CUPCUILE Hae aSeamcememaism sine seeiae sic sie sini se 
R1 1 2 {value} 
10 Dl 2 0 D1N4007 
11. *===. Variable parameters #=--<= -+-<see=e2ssHeeses 
12 .PARAM value=1K 
13. .STEP PARAM value LIST 500 1000 1500 
LA sass DC Aid| VSise: 22ssssSs sees Sees ses eres eee 
15 Vin 10 
16 .DC Vin OV 2V 10mV 
Le =~ QUEDUE OPTION? cs rceret oc ceeose semesees set 
18 .PROBE I(D1) 
19 .END 


OANA OAHPWNHE 


The corresponding simulation results (obtained with PSpice—see tutorial in Appendix B) are 
depicted in Figure 25.8(b), with one curve for each value of R;. As expected, only after V;,, reaches 
the diode’s junction voltage (V;~0.6V) does current starts flowing through it. Consequently, the 
diode current, given by (V;,—Vj)/R,, should reach the following maximum values (for V;,=2V): 
(2-0.6) /0.5k=2.8 mA for R,;=5000, 1.4mA for R;=1kQ, and 0.93 mA for R;=1.5kQ. These values 
are very close to those that can be observed in Figure 25.8. 


EXAMPLE 25.2 DC RESPONSE OF A CMOS INVERTER 


Figure 25.9(a) shows a CMOS inverter. Its DC response was analytically examined in Section 9.5. 
Simulate it using SPICE, assuming that Vpp=5 V and the transistor sizes are (W/L), =1.5 4~m/1 pm 
and (W/L), =4.m / 1m. For the MOSFETs, employ the BSIM3v3 models seen in Section 25.4. Com- 
pare the overall results against those obtained in Section 9.5 and also determine the value of Vp 
(Equation 9.7). 


SOLUTION 


ASPICE code for this example is shown below. It again starts with a comment in line 2, followed by 
three sections. The first section (lines 4-8) contains the circuit. Note that the MOSFET models were 
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(1) Vout 5V 
2 
Vin 2) 
5v Vin 
SU 
ou 
(c) 6u 1u 5u 


2 U(2) © UC3) 


FIGURE 25.9. (a) CMOS inverter of Example 25.2; (b) Approximate analytical DC response seen in Section 9.5; 
(c) DC response obtained with PSpice (transistor sizes and models in (b) and (c) are not the same, so only the 
overall curve shapes can be compared). 


assumed to be in a separate file, called mos_models.cir, included in the code using the . INC command 
(line 4). In line 5, .OPTION is employed to enter default MOSFET parameters, that is: 


DEFW= Default width (shown in micrometers, 11m) 

DEFL=Default length 

DEFAD=Default area of drain (shown in micrometers square, pm) 
DEFAS= Default area of source 


Note in the last two that areas are represented by (um: m)=10~:10-m?=10-"m?=pm or PM. 
The third section (lines 10-11) contains specifications for the DC response, saying that a voltage 
called V,,,, connected to the input node 2 (see Figure 25.9(a)), must vary from 0 to 5V in steps of 
5mV. Finally, lines 13-15 contain output specifications (. PRINT and .PROBE were chosen this time 
to measure the voltages on nodes 2 and 3), followed by the mandatory . END command. 


. INC mos_models.cir 

-OPTIONS DEFW=1.5u DEFL=lu DEFAD=5p DEFAS=5p 
Mn 3200N 

Mp 3 2 11 P W=4u 


NOOR WME 
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8 VDD 10 DC 5V 


10 Vin 2 0 DC OV 
11 .DC Vin OV 5V 5mV 


13. .PRINT DC V(2) V(3) 
14 .PROBE V(2) V(3) 
15 .END 


Simulation results obtained with PSpice (see tutorial in Appendix B) are depicted in Figure 25.9(c). As 
can be seen, they are very similar to the approximate results obtained analytically in Section 9.5, shown 
in Figure 25.9(b). Because the transistor sizes and models are not the same in (b) and (c), only the over- 
all behaviors should be compared, which are similar. The value of V 7p in this circuit is ~1.8V. 


25.10 Transient Response Examples 


We present in this section complete SPICE simulations illustrating the transient response. 


MM EXAMPLE 25.3 TRANSIENT RESPONSE OF A CMOS INVERTER 


A CMOS inverter was seen in Figure 25.9(a). Using SPICE, obtain its transient response to square 
voltage pulses for the same supply voltage and transistor sizes employed in Example 25.2. 


SOLUTION 


A SPICE code for this example is shown below. The only difference with respect to the code 
seen in the previous example resides in the section where the type of response is specified (lines 11 
and 12). 


. INC mos_models.cir 

-OPTIONS DEFW=1.5um DEFL=lum DEFAD=5pm DEFAS=5pm 
Mn 3200N 

Mp 3 2 11 P W=4um 

*CL 3 0 O.5PF 

9 VDD 10 DC 5V 


ANA OKPWDY FE 


11 Vin 2 0 PULSE (OV 5V 5N ON ON 5N 10N) 
12 .TRAN 0.01N 20N 


15 .END 
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Os Sins 16s 150s 20ns 6s 1668 ns 266 ns 3668 ns 466 ns 
o U(3) Time + U(3) Time 


(a) (b) 


FIGURE 25.10. Transient response relative to the CMOS inverter of Figure 25.9(a): (a) Without any load; 
(b) With a 0.5 pF load. 


Simulation results, obtained with PSpice (described in Appendix B), are shown in Figure 25.10. The 
results in (a) were obtained without any extra load, while those in (b) correspond to the circuit 
with a 0.5 pF capacitor connected to the output (see line 8 of the code). Observe in Figure 25.10 the 
different time ranges employed in the horizontal axis. 


EXAMPLE 25.4 TRANSIENT RESPONSE OF A D LATCH 


Figure 25.11(a) shows the circuit for a D-type latch (studied in Section 13.3—see Figure 13.8(c)). 
Simulate it using SPICE and compare the results against the approximate results seen in Section 13.3 
(Figure 13.5). Assume that lambda is 0.3 1m and adopt only minimum-size transistors. 


SOLUTION 


The minimum size for MOS transistors normally is W/L=3) / 2); therefore, (W/L),,in=0.9 wm/0.6 wm. 
Because all transistors must have these dimensions, it is convenient to use the .OPTION command to 
enter them as default values. A corresponding SPICE code is shown below. As before, the code starts 
with a comment, followed by three sections, the first containing the circuit, the second the analysis 
specifications, and finally output specifications and the . END command in the third section. Note that 
clk, being a regular waveform, was declared using the keyword PULSE, while d (data), being irregular, 
was specified using PWL. 
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266ns 


FIGURE 25.11. (a) D latch of Example 25.4; (b) Approximate expected results (borrowed from Figure 13.5); 
(c) Simulation results obtained with PSpice. 


. INC mos_models.cir 
-OPTIONS DEFW=0.9UM DEFL=0.6UM DEFAD=5PM DEFAS=5PM 


M16520N 
M2 6421 P 
M3 7600N 
M4 7611 P 
M5 8700N 
M6 8711 P 
M7 10700N 
M8 64 100N 
M9659 1 P 


M10 97141P 
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M1143 00N 
Mi2 43 11P 
M13 5400N 
M145411P 
VDD 1 0 DC 5V 


Vclk 3 0 PULSE (OV 5V 50N ON ON 50N 100N) 

Vd 2 0 PWL (ON OV 15N OV 15N 5V 35N 5V 35N OV 
+ 65N OV 65N 5V 85N 5V 85N OV 125N OV 
+ 125N 5V 175N 5V 175N OV 220N OV) 

. TRAN 0.1N 220N 


. END 


Simulation results, obtained with PSpice (described in Appendix B), are shown in Figure 25.11(c). 
As can be observed, they are very similar to the expected results shown in Figure 25.11(b), borrowed 
from Section 13.3 (Figure 13.5). 


25.11 AC Response Example 


We present in this section a SPICE simulation focused on the AC response. Even though this type of 
analysis only applies to linear (hence analog) circuits, an example is included below to illustrate the use- 
fulness of SPICE even further. 


MM EXAMPLE 25.5 AC RESPONSE OF AN RC CIRCUIT 


Figure 25.12(a) shows an RC low-pass filter. Estimate its frequency response, then obtain the Bode 
plot for the magnitude of ¥,,/Vj, using SPICE code and compare the result against the predictions. 


SOLUTION 


The RC circuit of Figure 25.12(a) is a single-pole low-pass filter whose transfer function is given by 


Gy R, 
G(s) =_—-—, where Go=R yA, and W.= 


= 1 
~14+s5/W (Ry//R,)G" 


Gy is the low-frequency gain, while we is the circuit’s only pole (given in rad/s). In this case, we 
also represents the circuit’s cutoff frequency, that is, the frequency at which the gain is 3 dB lower 
than Gy (20log| G| y= 20log|Gp|-3 or, equivalently, |G|,,-,,,=0.707|G)|). With the values 
given in Figure 25.12(a), wc=2 krad/s and Gy=0.5 result. Therefore, the cutoff frequency, in Hz, is 
fo=W-/27=318 Hz, and Gy, in decibels, is 20logG,=-6 dB. 

A SPICE code for this circuit is shown below. It again starts with a comment, followed by three 
sections, which contain the circuit, analysis specifications, and finally output specifications. Note 
that the source is an independent AC source with sinusoidal sweep, described in Section 25.6. The 
option with logarithmic sweep (DEC) was chosen, with 100 points/decade, starting at 10Hz and 
finishing at 20 kHz, hence encompassing the cutoff frequency of 318 Hz estimated above. 
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16Hz 166 H2 1.6KHz2 16 KHz 
o DB(U(2)/ U(1)) Frequency 


FIGURE 25.12. (a) RC low-pass filter of Example 25.5; (b) Simulation results (Bode plot) obtained with PSpice. 


ey sees ae etas Ss Se eS re eet 
* RC circuit of Example 25.5 
Meese See eee ee Se See Se sas 
Rl 1 2 1k 
R2 2 0 1k 
C1 20 lu 
Kae ee 2 sa aye Sree eae SySyee eros eee 


Vin 1 0 DC OV AC 1V 
-AC DEC 100 10Hz 20KHz 


. END 


Simulation results, obtained with PSpice (described in appendix B), are shown in Figure 25.12(b). 
Notice near the horizontal axis that the option DB(...) was employed to plot the Bode diagram for 
the magnitude of V(2)/V(1). As can be seen, the low-frequency gain is -6dB, as expected, and the 
cutoff frequency (where the gain is -6 dB—3 dB=—9 dB) is around 318 Hz (this value can be observed 
with precision using the cursor in the oscilloscope-like screen created by .PROBE within PSpice). 


25.12 Monte Carlo Analysis 


Monte Carlo is a statistical analysis in which device parameters are randomly varied by the simulator 
within the specified tolerance range. The .MC command is used, whose syntax is shown below, where 
d#Fruns is the number of runs (the first run is always performed with the nominal parameter values), 
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analysis is the analysis type (DC, AC, or transient), and output_var is the output variable to be 
measured; function and option are explained below. 


.MC #runs analysis output_var function [option] 
The alternatives for function are as follows. 


MAX: Finds the maximum value of the specified variable. 

MIN: Finds the minimum value of the specified variable. 

YMAX: Finds the maximum difference from the nominal (first) run. 

RISE_EDGE value: Finds the first crossing of the variable above the specified value. 
FALL_EDGE value: Finds the first crossing of the variable below the specified value. 


The main alternatives for option are listed next. 


LIST: Prints in the .out file the parameter values used in the first (nominal) run. The results are obvi- 
ously available to be displayed with the . PROBE command. 


LIST OUTPUT ALL: Prints in the .out file the parameter values used in all runs. All results can again 
be displayed using . PROBE. 


Examples of Monte Carlo analysis declarations: 


-MC 8 DC V(5) MAX ;8 runs of a DC response for V(5) 
-MC 8 DC V(5) MAX LIST 

-MC 8 AC V(5) MAX LIST OUTPUT ALL 

-MC 8 TRAN V(5) YMAX 

.MC 8 TRAN V(5) MIN 


To enter the parameter tolerances, the .MODEL command must be used. Its syntax for resistors, capaci- 
tors, and inductors is illustrated in the examples below. 


.MODEL Rmodel RES (R=1 DEV=15%) 
-MODEL Cmodel CAP (C=1 DEV=10%) 
.MODEL Lmodel IND (L=1 DEV/GAUSS=0.1) 


In these examples, the names chosen for the models are Rmode1, Cmodel, and Lmodel. The tolerance can 
be for the individual device (DEV) or for the lot (LOT); in the former the variations are independent, whereas 
in the latter the same variation is assigned to all devices of the same model. The tolerance can be speci- 
fied using percentage or absolute value, and with uniform or Gaussian distribution. The options between 
parentheses also include a multiplication factor (=1 in the examples above), which affects the parameter’s 
nominal value. A complete example, illustrating the use of Monte Carlo analysis, is shown next. 


MM EXAMPLE 25.6 MONTE CARLO ANALYSIS OF A DR CIRCUIT 


Assume that the resistor (R,) in the DR (diode-resistor) circuit of Example 25.1 (Figure 25.8(a)) exhibits 
a 20% tolerance. Repeat that simulation, now for a single value for R, (=1kQ), but with the tolerance 
above and Monte Carlo statistics included. V;,, must again vary from 0 V to 2V in steps of 10 mV. 


SOLUTION 


A SPICE code for this example is shown below. It again starts with a comment in line 2, fol- 
lowed by the models for the diode (lines 4-7) and for the resistor (line 8). Next, the circuit 


25.12 Monte Carlo Analysis 647 


proper is described (lines 10 and 11), which contains only two devices. The following section 
contains the analysis specifications (DC combined with Monte Carlo; remember that the 
latter is not an analysis on its own, but a complement to another analysis, namely DC, 
AC, or transient). Line 14 says that it is a DC analysis, while line 15 specifies the associated 
Monte Carlo analysis, where the number of runs is 3, the variable of interest is I(D1), its max- 
imum value must be measured, and the parameter values in all runs must be recorded. The 
graphical results are depicted in Figure 25.13. Note that the first run is always for the nomi- 
nal parameter values, so the curve in the center coincides with the curve for R,=1kQ in 
Figure 25.8(b). The corresponding .out file (not shown) informs that the first run was conducted 
with R,=1kQ (nominal value, as expected), the second with R,=0.97702kQ, and the third with 
R,=1.0575kQ. 


Ass a3 eye. aie a Seen are re Sa fea aha Saye ere ore re ease eee a See 
*SPICE code for the circuit of Example 25.6. 
*ore Diode. Models 2 se =ssssese esos E ene cee 


.MODEL D1IN4007 D(Is=14.11n N=1.984 Rs=33.89m 
+ Ikf=94.81 Xti=3 Eg=1.11 
+ Cjo=25.89p M=0.44 Vj=0.3245 
+ Fco=0.5 Ibv=10u Tt=5.7u Bv=1500) 
-MODEL Rmodel RES (R=1 DEV=20%) 
9. ‘Aesee CUPCUlLEN So ose eee eeee peer ee ce came ee 
10 Rl 1 2 Rmodel 1K 
11 D1 2 0 D1N4007 
12 *---- DC+MC analyses: -----------755rrr rr rre 
13 Vin 10 
14 .DC Vin OV 2V 10mV 
15 .MC 3 DC I(D1) MAX LIST OUTPUT ALL 


AnNaA oOfPWNMFE 


Lo *=<== Output: Options =sssasane—se sense ose 
17 .PROBE I(D1) 

18 .END 

19 *------------------------------------------- 


ov 0.5V 1.0V 1.5V 2.0V 
oeov (D1) Vin 


FIGURE 25.13. Results from the combined DC+ Monte Carlo analysis for the circuit of Figure 25.8(a). | 
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25.13  Subcircuits 


We conclude this chapter by showing how hierarchical circuits can be built for simulations using SPICE 
code. To instantiate a subcircuit, its code must start with . SUBCKT and finish with . ENDS. In a subcir- 
cuit instantiation, the subcircuit’s name must start with X. This technique is illustrated in the example 
below. 


MM EXAMPLE 25.7 DFF CONSTRUCTED WITH SUBCIRCUITS 


We saw in Chapter 13 that a D-type flip-flop (DFF) can be constructed using two D-type latches 
(DLs) operating in a master-slave configuration (see Figure 13.14(a)). Write a SPICE code to simu- 
late a positive-edge DFF using the . SUBCKT command to instantiate two DLs and also the inverters 
needed to process the clock. As in Example 25.4, assume that lambda is 0.3 4m and adopt minimum 
size for all transistors except for the inverters that process the clock and the output inverter of each 
DL, which must be designed with (W/L),,=64/2 and (W/L), =12h/2d. 


SOLUTION 


Figures 25.14(a) and (b) show the same pair of inverters and DL simulated in Example 25.4 (see Figure 
25.11). Each of these units is considered a subcircuit in the present example, called INVERTERS and 


(1) —>cik’ 
ae | 
(7) 
clock (2) 7 clk d q 
M3 M5 
(0) 
A clk’ —_»—_(3) 
(3) —clk’ clk —»—(4) 
clock—*(2) INVERTERS 
(4) —clk 
| 
(a) (b) 
| 5 
d (2) Sq 


(3) DLATCH | 


Iz | 


| 
(3) (3) 

clock —+(2) INVERTERS 
} (4) 
a (c) 


FIGURE 25.14. DFF constructed with subcircuits: (a) Inverters needed to process the clock (subcircuit 
INVERTERS); (b) D latch (subcircuit DLATCH); (c) Complete DFF, constructed with three subcircuits. 
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DLATCH, respectively. The use of these subcircuits to construct the DFF is shown in Figure 25.14(c). 
Observe that, for it to be a positive-edge DFF, the first DL (master) must be transparent when the 
clock is low, while the second (slave) must be transparent when the clock is high (see the connections 
between the clock lines and the DLs). Note that the enumeration of the internal nodes does not need 
to be different in one subcircuit with respect to the other (Figures 25.14(a) and (b)), but external repeti- 
tions obviously cannot occur in the final circuit (Figure 25.14(c)). 

A SPICE code for this example is presented below. It starts with a comment in line 2, followed 
by global declarations in lines 4 and 5 (file where the MOSFET models are and default values for 
transistor sizes). The next section (lines 7-18) contains the DLATCH subcircuit (note that it starts with 
.SUBCKT and ends with . ENDS). The other subcircuit (INVERTERS) is in lines 20-25. The main cir- 
cuit is in lines 27-30. Observe that the subcircuits’ names start with X. The next section (lines 32-35) 
specifies the type of analysis (transient in this case) and respective waveforms needed to perform 
the tests. The final section (lines 37 and 38) contains output specifications (PROBE was chosen) and 
the mandatory .END command. 


1 ica sta atte area eta ate ee ee a fa eee as & fore eeenee: a eee eae eee nee 
2 *D Flip-Flop of Example 25.7 

3 Were a Sian aera ane afer Sie. e ia, Se eaten a a aren S ay utara tas pet Se ete cm re yah es Se Se 
4  .INC mos_models.cir 

5 .OPTIONS DEFW=0.9u DEFL=0.6u DEFAD=5p DEFAS=5p 

6 DIS a a i EN eR a a aN I Nr a NS ear as I en a ree i ae a 
7 .SUBCKT DLATCH 1 2 3 4 7 

8 Ml15420N 

9 M25321P 

10 M3 6500N 

11 M46511P 

12 M5 7 600N W=1.8u 

13 M6 7 611 P W=3.6u 

14 M7 8600N 

15 M853 80N 

16 M9 5491 P 

17 M10 96141 P 

18 .ENDS 


20 .SUBCKT INVERTERS 1 2 3 4 
21 M1 3 200N W=1.8u 


22 M2 3 211 P W=3.6u 

23 M3 43.00 N W=1.8u 

24 M4 43 11 P W=3.6u 

25 .ENDS 

26 * Sie iS S.sere. a eetece sae S.S eae eee Sa, Serre eee ie 
27 X1 127 6 4 DLATCH 

28 X21 4 67 5 DLATCH 

29 X3 1 3 6 7 INVERTERS 

30 VDD 1 0 DC 5V 

31 * Sicis Sst SSS es So Stee See Sere eters oe eens hea ee ee ee 


32 Vclk 3 0 PULSE (OV 5V 50n On On 50n 100n) 
33 Vd 2 0 PWL (On OV 25n OV 25n 5V 65n 5V 65n OV 
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34 + 85n OV 85n 5V 125n 5V 125n OV 200n OV) 
35 .TRAN 0.1n 220n 

36 Kegs eete Sessa Seas eee ee Ss as Se ee se eres a See Se Srese 
37 .PROBE V(2) V(3) V(4) V(5) 

38 .END 

39 Ks ocsecs aces Sesser e shes See Ss See eee Se Sk See 


Simulation results (obtained with PSpice, described Appendix B) are depicted in Figure 25.15. The 
first waveform shows the clock, with period = 100 ns, while the second shows the data (d) input. The 
third plot shows the master latch’s output, which is transparent while clock='0', and the last plot 
exhibits the slave latch’s output, which is the DFF’s output. As can be seen, the circuit does behave 
as expected and exhibits the following clk-to-q low-to-high and high-to-low propagation delays: 
totH= 15.6 ns and t,4; =21ns (these two values are indicated in the figure). 


n wi 
‘ ' 


n wi 
. ‘ 


" + + or . 
a a Te 1 i 


8s sans 86ns 126ns i ks 3 266ns 
2 U(S) tptH =15.6ns tpy_=2ins Tame 
FIGURE 25.15. Transient response obtained with PSpice for the DFF of Example 25.7 (Figure 25.14). O 


25.14 Exercises Involving Combinational 
Logic Circuits 
1. SPICE simulation of an AND gate 


Figure E25.1(a) shows an AND gate constructed with CMOS logic. All transistors are already named 
and all nodes are numbered. 


a. Suppose that the propagation delays low-to-high and high-to-low (see Figure 4.8) are both 2 ns. 
Sketch the output signal (y) in Figure E25.1(b). 
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b. Simulate this circuit with SPICE. Use the BSIM3v3 model for the MOSFETs (similar to that in 
Section 25.4—consult the MOSIS site at www.mosis.org). Assume that \= 0.1m and adopt (W/ 
L),=6/2X for the nMOS transistors and (W/L),=12\/2n for the pMOS ones. Generate the signals 
aand b shown in Figure E25.1(b) to verify the circuit’s transient response. Finally, compare the result 
against the sketch made in part (a). What are the values of f,, }; and £44, in this case? 


FIGURE E25.1. 


2. SPICE simulation of a compound gate 


After solving the exercise above, simulate the combinational circuit of Example 4.2 (repeated in 
Figure E25.2). Enter each gate as a subcircuit. Use the same transistor models and sizes employed in 
the previous exercise and the same stimuli employed in Example 4.2 (Figure 4.11). 


FIGURE E25.2. 


3. SPICE simulation of a multiplexer 


ATG-based 2x 1 multiplexer was described in Example 11.6 and then used in Example 11.7 to con- 
struct a larger multiplexer. 


a. Write a SPICE code to simulate the TG-based 2x1 multiplexer of Figure 11.16(b). This time 
assume that A= 0.2 4m and adopt minimum size for the nMOS transistors and twice that for the 
PMOS ones (that is, (W/L),,=3h/2d and (W/ Ly =6)/2X). Use either Level 3 or BSIM 3v3 models 
for the transistors (consult the MOSIS site at www.mosis.org). Generate suitable signals to verify 
the multiplexer’s functionality. 


b. Using .SUBCKT to instantiate the multiplexer of part (a) above, construct a SPICE code and test 
the transient response and operation of the 2 x3 multiplexer of Figure 11.17. 
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4. SPICE simulation of a Schmitt trigger 


Three Schmitt trigger (ST) circuits were described in Section 11.13. Write a SPICE code and simu- 
late the ST of Figure 11.30(a). Adopt the same model used in the previous exercise for the transis- 
tors, with the following sizes: (W/L)yj5=3\/2X, (W/L), m2, me = OA/ 2A, and (W/L) x93, a= 124/20. 
Assume that Vpp=5 V and employ a DC sweep from 0V to 5V, then from 5 V back to OV, in steps 
of 5mV, to obtain the circuit’s DC response. Compare the overall results against the curves in 
Figure 11.29(b), but observe that the present circuit is an inverter. What are the transition voltages 
and the corresponding hysteresis? 


25.15 Exercises Involving Combinational 
Arithmetic Circuits 
5. SPICE simulation of a full-adder unit 


The full-adder (FA) circuit was studied in Section 12.2, with Figures 12.1(b) and 12.2(b) repeated in 
Figure E25.5 below. 


a. Name all transistors and enumerate all nodes in the circuit of Figure E25.5. 


b. Write a SPICE code and simulate the transient response of this circuit to test its operation. Adopt 
for the transistors the same models and sizes of Exercise 25.3. The following propagation delays 
should be measured: 

- top and toy, from cin to cout. 
- topH and thy from a (or b) to cout. 
- typH and thy between cout and s. 


; cin 
cin cout f 


FIGURE E25.5. 


6. SPICE simulation of a carry-ripple adder 


a. Using .SUBCKT to instantiate four times the FA tested in the previous exercise, simulate the 
transient response of the carry-ripple adder of Figure 12.3(b) to test its operation. 


b. We know that the main advantage of this adder is its compactness, but, in exchange, it is slower 
than other architectures because the carry bits must propagate through all stages. Using proper 
stimuli, such that cin causes all carry bits to change, measure the carry propagation delay through 
each stage and also the accumulated value (from cin to count). Are t,1 + and t,44, necessarily equal? 


25.15 Exercises Involving Combinational Arithmetic Circuits 653 


7. SPICE simulation of a comparator 


Digital comparators were studied in Section 12.7. In this exercise we want to obtain the transient 
response of the comparator seen in Figure 12.15(b) (repeated in Figure E25.7 below) to check its 
operation. Create a subcircuit for the FAs (use the code written to solve Exercise 25.5) and another 
for the NOR gate, then instantiate them to obtain the complete circuit. Using appropriate wave- 
forms, simulate the circuit and check its overall behavior. 


FIGURE E25.7. 


8. SPICE simulation of an array multiplier 


Multipliers were studied in Section 12.9, where a parallel multiplier (also called array multiplier) was 
shown in Figure 12.20. That figure was repeated in Figure E25.8, this time for 3-bit inputs. Observe that 
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FIGURE E25.8. 
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this circuit employs a standard cell, shown on the left of Figure E25.8. Because the FA of Exercise 25.5 
will be needed, employ for the transistors the same specifications contained in that exercise. 


Write two separate pieces of SPICE code, one for the AND gate and the other for the FA (full- 
adder) that compose the standard cell. 


Use . SUBCKT to instantiate the AND and FA units to obtain the standard multiplying cell (call it 
STD_CELL). 


Use .SUBCKT again to instantiate 12 times STC_CELL to create the full 3-bit multiplier of 
Figure E25.8. Be careful about what to do with unused inputs. 


Finally, simulate your design using proper stimuli and check its overall operation. What is the 
longest-delay path in this circuit? 


25.16 Exercises Involving Registers 
9. SPICE simulation of a D latch 


10. 


D-type latches (DLs) were studied in Section 13.3, with several architectures presented in Figures 13.8 
(static) and 13.9 (dynamic). This exercise concerns the transient response of the SRAM latch shown 
in Figure 13.8(e). Adopt for the transistors the same specifications of Example 25.3 and apply proper 
input signals to verify this circuit’s transient response and operation. 


SPICE simulation of a DFF 


Figure E25.10 shows a dynamic TSPC (true single-phase clock) D-type flip-flop (DFF), which was 
seen in Section 13.5 (see Figure 13.17(c)). 


FIGURE E25.10. 


a. Assuming that the DFF’s propagation delay from clk to q is t,cg=5 ns for both low-to-high and 


high-to-low transitions, draw the resulting waveform for g in the graph of Figure E25.10. Con- 
sider that the clock period is 50 ns and that the DFF’s initial state is q='0'. 


Now name all transistors and enumerate all circuit nodes, then write a SPICE code and simulate 
the transient response of this circuit by applying to it the signals clk and d depicted in the figure. 
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Cc 


Compare the shape of q produced by the simulator to that sketched in part (a) above. Adopt for 
the transistors the parameters of Exercise 25.3 (or choose some other). 


What are the values of f,cg in the low-to-high and high-to-low transitions of this circuit? 


11. SPICE simulation of a TFF 


Figure 13.24 shows the construction of T-type flip-flops (TFFs) from D-type flip-flops (DFFs). Using 
SPICE code, test the TFF of Figure 13.24(a). For the DFF, use the same circuit simulated in the previ- 
ous exercise. Apply a clock signal to verify its transient response and operation. 


12. SPICE simulation of a dual-edge DFF 


Dual-edge DFFs were described in Section 13.7, where a mux-based architecture was presented in 
Figure 13.20(a). 


Assuming that the DFF’s propagation delay from clk to q is t,cg=5 ns for both low-to-high and 
high-to-low transitions, draw the resulting waveform for q in the graph of Figure E25.12. Con- 
sider that the clock period is 50 ns and that the DFF’s initial state is q='0'. 


Write two separate pieces of SPICE code, one for the DL (D latch) and the other for the MUX 
(multiplexer). The code from Example 25.4 or from Exercise 25.9 can be used for the former, 
while the code from Exercise 25.3 can be employed for the latter. 


Use .SUBCKT to instantiate the DL and MUX units to obtain the complete dual-edge DFF of 
Figure 13.20(a). 


Finally, simulate your design using the stimuli of Figure E25.12. Check its overall operation and 
compare the result against your sketch for q. 


aL 


FIGURE E25.12. 
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13. SPICE simulation of an asynchronous counter 


Asynchronous counters were described in Section 14.3. The circuit of Figure 14.15(b), which is a 
4-bit downward counter, was repeated in Figure E25.13 below. Each cell is simply a TFF (already 
simulated in Exercise 25.11, so employ the same transistor models and sizes). 


b. 


Using the clock as a reference, and assuming a propagation delay of 5 ns in each TFF, draw the 
output signals (49, 91, 42, q3 ). Check whether the circuit counts downward. 


Entering each TFF as a subcircuit, write a SPICE code for this counter and simulate its transient 
response. Check whether the outputs are functionally similar to those sketched in part (a) above. 
Also measure the propagation delays produced in the simulation. 
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FIGURE E25.13. 


14. 


15. 


16. 


SPICE simulation of a serial adder 


A serial adder was described in Section 12.4 (see Figure 12.12). Check its transient response and 
operation using SPICE. The two subcircuits needed in this case (FA and DFF) can be those of 
Exercises 25.5 and 25.10. 


SPICE simulation of an SR with load 


A shift register (SR) with data-load capability was seen in Section 14.1 (see Figure 14.2(b)). Using 
. SUBCKT to construct the MUX and DFF units, simulate its transient response and check its overall 
operation. Note that MUX and DFF circuits were already simulated in Exercises 25.3 and 25.10. 


SPICE simulation of a data scrambler 


Data scrambler-descrambler circuits were studied in Section 14.8. This exercise concerns the opera- 
tion of the scrambler shown in Figure 14.34(a) (repeated in Figure E25.16 below). Obtain its tran- 
sient response by applying to it the same data sequence of Example 14.13, then check whether the 
resulting output sequence matches that in Figure 14.34. Note that the two subcircuits (FA and DFF) 
needed in this case can be those simulated in Exercises 25.5 and 25.10. 


FIGURE E25.16. 


ModelSim Tutorial 


Objective: This tutorial, which is a complement to Chapter 24, briefly describes ModelSim, from 
Mentor Graphics, a popular simulator for VHDL- and Verilog-based designs. The tutorial is based on 
ModelSim-Altera Web Edition 6.1g, available free of charge from www.altera.com. 


The presentation is divided into two parts, as follows: 
@ Part I: Simulation Procedure 


In this part of the tutorial the simulation files are entered directly into the simulation environment, 
and the steps needed to process them are described. 


m@ Part II: Creating a New Project 


In this case a project is created before entering the simulation files, which improves code organiza- 
tion and reusability. The subsequent simulation procedure is exactly the same as that in Part I. 


About ModelSim 


A simplified view of ModelSim’s components and respective design flow is presented in Figure A.1. The 
diagram shows the VHDL (or Verilog) files at the top, which are combined with the VHDL (or Verilog) 
libraries by the first two components, vlib and vmap. Next appears the compiler (vcom for VHDL, vlog for 
Verilog), and finally the simulator (vsim). 

As seen in Chapter 24, at least two files are needed to simulate a circuit. One is referred to as the design 
file because it must contain the code from which the DUT is inferred. The other is referred to as the test 
file because it contains the testbench (input stimuli plus, optionally, output verifiers). 

In Part I of this tutorial, these files are entered directly into the simulation environment, while in Part II 
a project is created first, within which the same files are then located (directly or indirectly). The simulation 
procedure is exactly the same in both cases, but when a project is created its status is stored (in a project 
file with extension .mpf) so it can be continued later, also improving code organization and reusability. 


Part I: Simulation Procedure 
This part contains the following sections: 

1. Preparing the VHDL design and test files 

2. Creating the simulation library 


3. Compiling the files 
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FIGURE A.1. Simplified diagram of ModelSim’s components and design flow. 


- Running the simulation 
- Inserting breakpoints 
- Examining the results with the cursor 


. Examining the results with the zoom 


on op UU fF 


. Closing the simulation 


1. Preparing the VHDL Design and Test Files 


The files used in this tutorial are those in Example 24.2. The first file, called clock_divider.vhd, is a design 
file for a circuit that divides the clock frequency by 10. The second file, called test_clock_divider.vhd, is a 
test file that creates the external stimuli needed to test that design. 


Create a directory where all simulation files should to be saved. 


b. If preexisting files are used, simply copy them to the directory created above and go to part 2. 
Otherwise, proceed to step (c). 


c. Start ModelSim (ignore the welcome dialog boxes). 


d. Open its editor by selecting File>New>Source> VHDL. Click the Zoom/Unzoom Window 


icon L#! to enlarge the editing window. Now type the design file (clock_divider.vhd) shown in 
Example 24.2 and save it in the directory created in step (a). 


e. Repeat the procedure above for the test file (test_clock_divider.vhd) of Example 24.2 and save it in 
the same directory. Click the Zoom/Unzoom Window icon again to return the main window to 
its original size. 
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2. Creating the Simulation Library 
a. Start ModelSim if not done yet (ignore the welcome dialog boxes). 
b. Select File>Change Directory and select the directory created in step 1(a). 


c. Select File >New> Library and type work in the Library Name and Library Physical Name fields 
(in case they are not entered automatically) and click OK. 


d. Inthe Workspace (Library tab) check the presence of the new library (work) at the top of the list. 
This is a folder created inside the directory created above. 


3. Compiling the Files 
a. To compile the design, click ©& or select Compile>Compile. This opens the Compile Source 
Files dialog box shown in Figure A.2. 


b. With both files selected (Figure A.2), click Edit Source, which will cause both VHDL files to be 
opened in the main window. Even though this step is optional, it is very likely that you will need 
to modify /debug the files anyway. 


c. With both files selected (Figure A.2), click Compile. When finished, click Done. 
Note: In case you are debugging a file, you can compile it separately until the problems are fixed. 


d. Inthe Workspace, select the Library tab and click the ‘+’ icon next to work to confirm the presence 
of the design units corresponding to the two files compiled above (see Figure A.3). 


4. Running the Simulation 


a. Left double-click test_clock_divider in the Workspace (see Figure A.3). Alternatively, you can select 
Simulate > Start Simulation, then click the ‘+’ icon to expand the work library, select test_clock_divider 


Compile Source Files 


Library: [work 


Examinar: | € ClockDivider 


[aitest_clock_divider.vhd 


Nome do arquivo: |"Yest_clock divider. vad’ "clock divider. hd" 


Arquivos do tipo: [HDL Files (".v.".vi.".vhd.*.vhdl”. vho;" hdl" v_» |} _Done_| 


Default Options. Edit Source | 


FIGURE A.2. 
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FIGURE A.4. 


and click OK. The Objects window will be opened with the three signals shown in the blue part of 
Figure A.4. Note that a new tab, called sim, was included in the Workspace. 


b. In the sim tab of the Workspace, right-click test_clock_divider and select Add >Add to Wave. The 
new situation will be that depicted in Figure A.5 with the wave pane included (black area). 


c. Any signal in the wave window can be dragged up or down (normally clock and reset are wanted 
at the top). To do so, press and hold the left mouse button on the signal’s name and move it to the 
desired position. 


d. Set the simulation time interval by selecting Simulate > Runtime Options, then type 1 ws in the 
Default Run box. This means that every time the simulation icon is clicked the simulator will 
advance an additional 1 ps time interval. 


e. Run the simulation by clicking the Run Simulation icon ell (see Figure A.5) or by selecting 
Simulate >Run> Run 1 ps. 


f. Click the Zoom Full icon (see Figure A.5) to have the complete plot displayed in the window. 
The resulting waveforms are depicted in Figure A.6. 


g. Repeat steps (e-f) a few times and observe that the plot grows 1 us each time. 


h. Clean the waveforms window by clicking the Restart icon 5F (see Figure A.5), then repeat steps 
(e-f) two or three times until the plot is the size you want. Now inspect the results. 
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Simulation interval Run simulation 


Restart simulation 


Zoom full 
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FIGURE A.5. 
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FIGURE A.6. 


5. Inserting Breakpoints 


Clean the waveforms window by clicking the Restart icon EF 
b. Display the design file (clock_divider.vhd) in the main window (see Figure A.7). 


c. The red-numbered lines accept breakpoints. Left-click 16 and 19, which inserts a red ball (break- 
point) next to each line number (see Figure A.7). 


d. Left-click the red ball to disable the breakpoint (black ball). Now right click it and select Remove 
Breakpoint to eliminate it. Finally, reinsert the breakpoints. 


e. Run the simulation by clicking the Run-All icon so the simulator will progress until the first 
breakpoint is reached. Note that a blue pointer stops at line 16 and the values of all signals at 


that point are updated in the Objects pane. Click at again several times, noting that the pointer 
stops in line 16 during five iterations and in line 19 for one iteration. 


f. This is another way of seeing the values. Highlight count in line 16, right-click it and select Examine. 
A balloon will be displayed with the time coordinate and the corresponding signal value. 


g. Finally, single-step through the code. Remove all breakpoints and click ® to advance one itera- 
tion at a time (the simulator jumps between red numbered lines). Do it several times and observe 
the values in the Objects window. 
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FIGURE A.7. 


6. Examining the Results with the Cursor 


b. 


Observe the five cursor-related and four zoom-related icons in Figure A.8. 


Undock the wave window by clicking J (this will separate the wave pane from the main 
window). Now maximize it. 


Click the Insert Cursor icon (#1 in Figure A.8), then click anywhere in the black area of the wave 
window. This will show a cursor with the time coordinate at its foot and also the corresponding 
signal values in the column adjacent to the signal names. Drag the cursor back and forth and 
observe the time and signal values changing. 


With the mouse over one of the waveforms, click next to one of its transitions. Observe that in 
this case the cursor snaps to the waveform edge. 


Select one of the waveforms by clicking its name (as depicted in Figure A.8, where clk was 
selected). Now click one of the Next Transition icons (#3 or #4 in Figure A.8) and note that the 
cursor jumps to the next waveform transition. 


Click the Insert Cursor icon (#1) again to insert a second cursor. Click on the new cursor to high- 
light it, and note that now the Next Transition icons (#3 and #4) are related to this cursor. 


Finally, click twice the Delete Cursor icon (#2 in Figure A.8) to remove both cursors. 


7. Examining the Results with the Zoom 


b. 


Click the Zoom Full icon (#9 in Figure A.8) to have the plot fit the window. 


Click the Zoom In 2x icon (#7 in Figure A.8) to have the plot enlarged, then click Zoom Full (#9) 
again to return it to its original size. 
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FIGURE A.8. 


FIGURE A.9. 


c. Click the Zoom Out 2x icon (#8 in Figure A.8) to have the plot reduced. To restore it to its previ- 
ous size, this time, instead of clicking Zoom Full, press the letter “L” on the keyboard, which 
restores the last view (alternatively, select View >Zoom> Zoom Last). 


d. Click the Zoom Mode icon (#6 in Figure A.8) and go to the waveforms region. Click the mouse 
near one of the signal transitions and hold it while dragging to the other side of the transition, 
as illustrated in Figure A.9. When the mouse is released, that section will be zoomed in. Click 
the Zoom Full icon (#9) to restore the plot to its original size. 


e. Redock the wave window. 


8. Closing the Simulation 


a. Select Simulate >End Simulation > Yes to exit ModelSim or to start another simulation. 
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Part Il: Creating a New Project 


This part shows how to create a project before entering the simulation files. It contains the following 
sections: 


1. Creating the working directory 
2. Setting the compile order 

3. Compiling the project 
4 


. Simulating the project 


1. Creating the Working Directory 
a. Create a directory where the project should be saved. 
b. Start ModelSim. 


c. Select Create a Project in the Welcome dialog box or select File>New > Project. This will open 
the dialog of Figure A.10. Choose a name for the project, then browse to the directory created 
above, leave work as the default library name, and click OK. 


d. At the conclusion of step (c), the Add Items to the Project dialog box shown on the left of 
Figure A.11 is opened. Click Add Existing File. In the next dialog (on the right of Figure A.11) 


fu Create Project 


Project Name 
Project Location 
modelsim_61g_web/modelsim_at Browse... 
Default Library Name 
work 
Cancel 


FIGURE A.10. 
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FIGURE A.11. 


Part II: Creating a New Project 665 


include the desired files (clock_divider.ohd and test_clock_divider.vhd), then check the Copy to 
project directory option and click OK. 


e. Check in Figure A.12 what the Workspace should look like at this point. 


2. Setting the Compile Order 
a. Select Compile >Compile Order, which opens the dialog box of Figure A.13. 


b. The files are already correctly ordered in this case (starting from the top). Press the Auto Generate 
button (Figure A.13), which will cause the compiler to read all project files to determine their 
sequential interdependences. Click OK (twice) when done. 


FIGURE A.12. 


NA Compile Order 
Current Order 


FIGURE A.13. 
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3. Compiling the Project 


a. Select Compile >Compile All. The question marks shown in the column Status of Figure A.12 
should change to regular check marks. 


4. Simulating the Project 


a. Load the design by selecting the Library tab in the Workspace and left double-clicking test_clock_ 
divider (see Figure A.14). As before, this opens the Objects pane (blue area) with all in/out signals 
contained in the design. 

b. Note that Figure A.14 is similar to Figure A.4. Therefore, from this point on the procedure is 
exactly the same as that seen in Part I. Hence go to step 4(b) of Part land proceed from there. 


lene voir rnd ote 
clk 0 Signal Internal 


Signal Internal 


output 1 Signal Internal 


FIGURE A.14. 


PSpice Tutorial 


Objective: This tutorial, which is a complement to Chapter 25, concisely describes the use of PSpice 
A/D to perform circuit simulations. The version used in this tutorial is Cadence OrCAD PSpice A/D 15.7. 


The tutorial is divided in two parts, depending on how the circuit to be tested is entered. 
@ Part I: SPICE Simulation with Coded Input 

In this case the circuit is entered using SPICE code (recommended). 
@ Part I: SPICE Simulation with Graphical Input 


In this case circuit schematics are entered instead (this is old fashioned and difficult to modify or 
share). 


Part I: SPICE Simulation with Coded Input 


In this case, the circuit is entered using a text file (that is, SPICE code). 


1. Entering the SPICE Code 


The RC circuit of Example 25.5 (repeated in Figure B.1) will be employed as an example, of which the 
transient response will be examined. 


a. Start PSpice A/D. A screen similar to Figure B.2 will be shown. 


b. Open a new text file by selecting File>New > Text File or click =I. 


c. Type (or paste) the SPICE code for the circuit of Figure B.1, with transient analysis specifications 
included, as follows, where Vj,, consists of a square wave with 10 ms period and 50% duty cycle, 
and the total simulation time is 25 ms divided in steps of 10 ps. 


FIGURE B.1. 
667 


668 APPENDIX B PSpice Tutorial 


= Ele View ‘Simulation Tools Window Help pit 


Vin 1 0 PULSE (OV 5V 5ms 0 O 5ms 10ms) 
.TRAN 10us 25ms 


END 


d. Save the file above using the extension .cir (rc_circuit.cir). 


2. Simulation 


a. Make sure that the active file (see bar below) is the correct one (rc_circuit.cir). 


» 1_citcuit > 


b. To run the simulation, select Simulation > Run rc_circuit or click > ; 


c. Once PSpice A/D concludes the simulation, the oscilloscope-like (PROBE) screen is opened 
automatically, showing a black screen similar to Figure B.3 but without any plots. 
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FIGURE B.3. 


d. 


To show the plots of Figure B.3, select Trace > Add Trace or click on F* Click on V(1) and V(2) to 
have the voltages of nodes 1 (input) and 2 (output) exhibited (notice that these signals are added 
to the Trace Expression field of the Add Traces dialog box when you click on their names). After 
selecting both signals, click OK. The plots of Figure B.3 will be displayed with axis ranges auto- 
matically set by the simulator. 


3. Examining the Results 


a. 


Practicing with the axis settings: Change the run time from 25 ms to 15 ms by selecting Plot > Axis 
Settings > X Axis > Data Range > User Defined and typing 0s to 15 ms. Asingle pulse should now 
result in the display. 


Now change the vertical axis range to —1 V to 6 V by selecting Plot> Axis Settings > Y Axis > Data 
Range > User Defined and typing —1 V to 6V. 


Practicing with the cursor: Click *# to add a cursor to the display. Click the mark before V(2), as 
shown here [QE to have the cursor attached to the output signal. Because this circuit’s time 
constant is T= R,//R,-C,=0.5 ms, V(2) is expected to grow from 0V to 63% of its final value (2.5 V) 
in 0.5ms (in other words, it should grow from OV to (1 -e~')2.5=1.58V in 0.5ms). Likewise, it is 
expected to decrease from 2.5 V to (2.5—1.58) =0.92 V also in 0.5ms. Check these values by placing 
the cursor at the time =5.5 ms coordinate and observing the values displayed at the bottom right 
of the screen, then repeating the test for the cursor at time =10.5 ms. 


Practicing with multiple plots: Select Plot>Add Plot to Window. Next, select Trace >Add Trace 
or click -4, then click on I(R1), I(R2), and I(C1) to have all three currents displayed. At this 
point, the plots should look like Figure B.4. 


Examine the new results. Note that the currents must obey I(R1)=I(R2)+1I(C1), and that the 
steady-state currents (when the voltages reach the regime values) should be I(R1) =I(R2)=2.5/ 
1k=2.5mA and [(C1) =0. 


Note: To copy the display to a regular text file, use the option Window > Copy to Clipboard. 
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FIGURE B.4. 


Part Il: SPICE Simulation with Graphical Input 


This section concisely describes how PSpice can be used to perform simulations using graphical inputs 
(that is, circuit schematics). OrCAD Capture is now needed in addition to PSpice A/D. 


1. Drawing the Circuit 


The same RC circuit of Figure B.1 (repeated in Figure B.5) will be employed as an example, of which the 
transient response will again be determined. 


a. Start OrCAD Capture. 
. Create a new project by selecting File>New > Project. Mark Analog or Mixed A/D, choose a 


name for the project, and provide the location where it should be saved. 


. Inthe next window, select Create a Blank Project, which displays a blank drawing canvas. 


- To place the resistors, select Place>Part or click zal then Add Library >PSpice>analog.olb. 


Next, select R, then OK. A1k© resistor will be made available. Click the drawing area approxi- 
mately where the resistor should be placed (it can be dragged around with the mouse). Repeat 
the procedure for the second resistor. To rotate it, use Edit> Rotate after selecting it. To leave the 
Place Mode, click ESC or with the right mouse button select End Mode. 


. Place the capacitor by selecting C from the same library. Click on 1n to change its value to 1u. 


In all devices, the units (F, V, A, ©, s, etc.) are optional. 


. Place the pulse generator by clicking again >| and selecting Add Library > PSpice >source.olb. 


Pick VPULSE and set the following parameter values (this is a5 V square waveform with 10ms 
period and 50% duty cycle): 

V1 (1st pulse voltage) =0V 

V2 (2nd pulse voltage) =5 V 

TD (time delay) =5ms 
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1 R1 2 
V1 =0V WWW 
V2 = 5V 
TD = 5ms Vin C1 R2 
TR =0 Ju 1k 
TF =0 
PW = 5ms 
PER = 10ms 

= 0 


FIGURE B.5. 


j. 


TR (rise time) =0 

TF (fall time) =0 

PW (pulse width) =5ms 
PER (period) = 10 ms. 


Drag the parts to the desired positions (click and hold the left mouse button on the part to 
drag it). 
Now use Place > Wire or _L] to interconnect them. 


Finally, place the reference voltage (ground) by clicking = and selecting 0/Capsym. 


Save the project. 


2. Simulation 


Create a simulation profile. Still in the same OrCAD Capture screen, select Pspice > New Simu- 
lation Profile or click #4]. In the dialog box select /type: 


Analysis Type: Time domain (Transient) 
Run to Time: 25ms 
Maximum Time Step: 10 ys 


Run the simulation by selecting Pspice > Run or by clicking >. 


After PSpice A/D concludes the simulation, the oscilloscope-like (PROBE) screen is opened 
automatically. This situation is exactly the same as that in step 2(c) of Part I (Figure B.3), so go to 
step 2(d) of Part I and proceed from there. 
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494-495 
open-drain, 86-87 
regular, 84-85 
timing diagrams, 76-77 
tri-state, 85-86, 258, 259 
burst error-correcting codes, 159 
bus drivers (circuits), 85, 86 
bypassing carry circuit, 296 
bytes, 5 
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CMOS flip flops, 333, 334 
C?MOS latches, 324, 328 
California Institute of Technology, 2 
CAM (content-addressable memory) for 
cache memories, 446-447 
capacitive power consumption, 73 
car alarms 
basic alarm, 578-581 
with debounced inputs, 581-583 
with debounced inputs and ON/OFF 
chirps, 583-587 
FSM approach, 430 
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carry and binary addition, 27, 47-48 
carry delay, 296 
carry-lookahead adders, 297-300, 540-543 
carry-ripple adders, 291-292, 511-512, 
539-540 
carry-select adders, 297 
carry-skip (carry-bypass) adders, 296 
CASE statement, VHDL, 504 
CCFF flip flops, 333, 343 
CDs and RS codes, 159-161 
channels 
MOSFET construction, 199 
n-channel MOS, 69-71, 199 
p-channel MOS, 69-71, 199 
width-to-length ratio, 199 
CIRC (cross-interleaved Reed-Solomon 
code), 160 
circuit responses 
AC responses, 18, 191-192, 209 
concepts, 16 
DC responses, 17, 185-189, 202-205 
transient response, 17-18, 189-191 
circuits, digital. See also combinational 
arithmetic circuits; 
combinational logic circuits; 
sequential circuits 
analog circuits vs., 4-5 
combinational vs. sequential, 10 
concepts, 6-9 
integrated circuits, 10-11 
simulation with SPICE, 19-20, 621-622 
for SOP and POS equations, 112-117 
circular right shifter, 276 
circular shift (rotation) 
described, 53, 275 
shifters, 275-277 
circular timers, 376-377 
CLB (configurable logic block), 479-480 
clock dividers, VHDL testbench 
simulations, 608-611, 613-614, 
616-617 
clock gating, 343 
clock generators, 5, 277-278 
clock management and FPGAs, 483-485 
clock signals, 9 
clock skew, 277-278, 333-335 
clock transitions, slow, 334-335 
clocked registers, 9 
clocked-CMOS (C?MOS) logic, 234 
CML (current-mode logic), 225 
CMOS (complementary MOS) 
BiCMOS logic, 232 
circuits, 227 
clocked-CMOS logic, 234 
CMOS-TTL interface, 228 
concepts, 10-11, 71 


fan-in/ fan-out, 229 
HC/HCT CMOS families, 227-228 
inverter logic, 72-73, 206 
I/O standards, 237 
low-voltage CMOS, 229-230 
noise margins, 229 
physical vs. logic values, 14 
power consumption, 73-74, 230 
power-delay product, 230 
supply voltage/signal voltages, 229, 
236-240 
transmission-gate logic, 231-232, 259 
code spectrum, line codes, 135-136 
codes. See binary representations; error 
detecting /correcting; line codes 
combinational arithmetic circuits 
ALU-based unsigned /signed 
multipliers, 311-312 
ALUs, 306-307, 547-549 
arithmetic-logic unit, 547-549 
bit-serial adders, 300-301 
carry-lookahead adders, 297-300, 
540-543 
carry-ripple adders, 291-292, 539-540 
carry-select adders, 297 
carry-skip adders, 296 
vs. combinational logic circuits, 
258, 289 
comparators, 304-305 
decrementer, 303, 304 
dividers, 312 
fast adders, 293-300 
fast adders, approaches for, 294-295 
faster carry propagation, 295 
four-bit carry-lookahead circuit, 
299-300 
full-adders units, 290-291 
generate, propagate, and kill signals, 
293-294 
incrementer, 303, 304 
Manchester carry-chain adders, 
295-296 
multibit adders, 291-292 
multipliers, 307-312 
parallel signed multipliers, 309 
parallel unsigned multipliers, 
308-309 
parallel-serial multiplier functional 
analysis, 310 
parallel-serial multiplier timing 
analysis, 310-311 
parallel-serial unsigned multipliers, 
309-311 
signed adders/subtractors, 301-303 
signed and unsigned adders/ 
subtractors, 543-544 
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signed and unsigned multipliers / 
dividers, 545-546 
signed vs. unsigned adders, 301 
signed/unsigned addition, 292 
subtractors, 301-303 
two’s complementer, 303-304 
combinational logic circuits. See also 
combinational arithmetic 
circuits; sequential circuits 
address decoders, 262-263 
address decoders with enable, 264 
address encoders, 266-268 
BCD-to-SSD converter, VHDL, 
525-527, 558-560 
binary sorter, 274-275 
vs. combinational arithmetic circuits, 
258, 289 
compound gates, 259-262 
described, 87 
generic address decoder, VHDL, 
523-525 
generic multiplexer, VHDL, 527-528 
generic priority encoder, VHDL, 529 
large address decoders, 264-265 
majority /median functions, 47, 
274-275 
memories, 280 
multiplexers, 268-272 
nonoverlapping clock generators, 
277-278 
parallel logic translators, 262 
parity detectors, 272 
POS-based CMOS circuit, 260-262 
priority encoders, 272-274 
propagation delays and glitches, 
123-124 
ROM design, VHDL, 530-532 
Schmitt triggers, 279-280 
vs. sequential circuits, 10, 257-258 
shifters, 275-277 
short-pulse generators, 278 
SOP-based CMOS circuit, 260 
standard circuits for SOP and POS 
equations, 112-117 
synchronous RAM design, VHDL, 
532-535 
system resolution and glitches, 
410-411 
timing diagrams, 75-77, 78-79, 
265-266 
commercial temperature range, TTL 
chips, 222 
common-cathode SSDs, 525, 558 
common-emitter circuits, DC response 
examples, 186-189 
common-source circuit, 202-203 
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Common-term theorem, 105, 106 
compact MOS models, SPICE, 627, 628 
comparators, 304-305 
comparison operators, VHDL, 500 
compiling simulations, 659, 665-666 
complementary MOS. See CMOS 
compound gates, 259-262 
compressive strain, 211 
concatenated RSV (Reed-Solomon- 
Viterbi) decoder, 167 
concatenation operators, VHDL, 499 
concurrent code, VHDL, 493-494, 
501-503 
conditional capture, 343-344 
Consensus theorem, 105 
CONSTANT object, VHDL, 506-507 
constraint length and convolutional 
codes, 163 
contamination delays 
D-type flip-flops, 330-331 
D-type latches, 322-323 
continuous-valued signals, 4 
control characters, 35, 37 
control ports, DFFs, 344-345 
convolutional codes, 163-167 
constraint length, 163 
decoders, 167 
FSM model, 164-166 
minimum free distance, 165 
serial bits, 163 
transitions, 165 
trellis diagrams, 165-166 
Voyager spacecraft 
implementation, 164 
Viterbi decoder, 167-169 
correcting errors. See also error 
detecting /correcting 
concepts, 153-154 
convolutional codes, 163-167 
Hamming codes, 156-159 
low density parity check codes, 
171-174 
Reed-Solomon codes, 159-161 
turbo codes, 170-171 
Viterbi decoder, 167-169 
counters 
asynchronous. See asynchronous 
counters 
design techniques, 399 
downward asynchronous, 92-93, 
368-369, 655 
D-type flip-flops, 91-93, 357-358, 
362-364 
FSM synchronous 0-to-7counter, 
402-404 
FSM synchronous 3-to-9 counter, 
404-406 


FSM synchronous Gray-encoded 
counter, 425-426 
FSM synchronous one-hot encoded 
counter, 424-425 
full-scale/partial-scale, 91, 355 
introduction, 91 
partial-scale, 91 
synchronous. See synchronous 
counters 
VHDL, 508-509 
CPLDs (complex programmable logic 
devices) 
Altera, 477-478 
architecture, 471-475 
concepts, 15, 18-19 
Xilinx, 475-477 
cutoff region, transistors, 184-185 
CVSL (cascode voltage switch logic) 
latches, 324, 326, 328 
cyclic Hamming codes, 159 
cyclic redundancy check (CRC) codes, 
155-156 


D 
data frames and RS codes, 160 
data types, VHDL, 496-498 
data-stable requirements 
D-type flip-flops, 330-331 
D-type latches, 322-323 
DC component, line codes, 135 
DC (direct current) responses 
bipolar transistors, 185-189 
introduction, 17 
MOSFETs, 202-205 
SPICE analysis, 622, 630-631 
DCCER flip flops, 333, 343 
decimal conversion to binary, 22-23 
decimal range 
binary code, 22 
one’s complement code, 27 
sign-magnitude code, 26 
decoders (combinational circuits) 
address decoders, 262-263 
address decoders with enable, 264 
generic address decoder, VHDL, 
523-525 
large address decoders, 264-265 
seven-segment display, 266-268 
decoders 
bipolar codes, 139 
concatenated RSV, 167 
convolutional codes, 167 
cyclic redundancy check codes, 
155-156 
error detecting /correcting concepts, 
153-154 
Fano algorithm, 167 


Hamming codes, 158-159 
line codes, 133-136. See also specific 
line codes 
low density parity check codes, 171-174 
mB/nB codes, 140-143 
MLT codes, 140 
PAM codes, 143-148 
polar codes, 138-139 
Reed-Solomon codes, 159-161 
sequential, 167 
single parity check codes, 154-155 
turbo codes, 170-171 
unipolar codes, 137-138 
Viterbi decoder, 167-169 
decrementer, 303, 304 
degenerate truth tables, 108 
delays. See also propagation delays 
carry delay, 296 
transient responses, 17-18, 189-191 
DeMorgan’s law, 105, 106 
deracer, 331, 340, 342 
descramblers. See scramblers/ 
descramblers 
Design Compiler RTL Synthesis 
(Synopsys), 492 
design file, VHDL, 601-602, 613 
DESPFF flip flops, 333, 342, 343 
detecting errors. See error detecting / 
correcting 
DE/TG-C?MOS flip flops, 333, 342 
DFFs. See D-type flip-flops 
differential amplifier, 240 
differential Manchester codes, 140 
digital circuits. See circuits, digital 
digital logic families, 219. See also logic 
families 
digital-to-analog converter (D/ AC), 4 
diode-resistor (DR) circuit, SPICE 
simulation, 638-639, 646-647 
DIP (dual in-line package), 11, 12 
directories for simulations, 664-665 
discrete states (digital), 4 
discrete-valued signals, 4 
disk (channel) bits, audio CD 
encoding, 161 
displays. See LCDs; SSDs 
divide-by-5 with symmetric phase FSM, 
422-423 
dividers 
combinational arithmetic circuits, 312 
frequency, 374-377 
divine proportion, 561 
division 
as addition/subtraction plus shift 
operations, 58 
ALU-based algorithms, 57-58 
floating-point, 62 
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signed division, 58-59 
unsigned, 57-58 
using arithmetic shift, 53 
using logical shift, 53 
DLs. See D-type latches 
DLT (diode-transistor logic), 220-221 
DC response example, 188-189 
junction voltage, 220 
short/open circuits, 220 
domino logic, 233-234, 262 
donor-type dopant, 181, 197 
“don’t care” values 
Karnaugh maps, 119 
truth tables, 108 
doping/dopants, semiconductors, 2, 
181-182, 197-198 
double-precision floating-point, 
31-32, 59 
downward asynchronous counters, 
92-93, 368-369, 655 
drain, MOSFETs, 198 
DRAM (dynamic random access 
memory) 
chip architecture, 440-441 
circuits, 439 
memory-read, 439-440 
memory-write, 440 
refresh, 441 
sense amplifier, 441-442 
drivers, 85 
DSETL flip flops, 333, 340, 341 
DSP (digital signal processing) blocks, 
482-483 
DSPFF flip flops, 333, 342, 343 
DSTC flip flops, 333, 336, 337 
DSTC1/2 latches, 324, 328 
DTG flip flops, 333, 334 
DTG1/2 latches, 324, 327, 328 
D-type flip-flops (DFFs) 
with clear, 345 
concepts, 9, 87-89 
construction approaches, 331-332 
contamination delays, 330-331 
control ports, 344-345 
counters, 91-93, 355-371 
data-stable requirements, 330-331 
dual-edge D flip-flops, 342-343 
with enable, 345 
frequency divider, 9, 89 
functional analysis, 329-330 
list of, 333 
master-slave DFFs, 331, 332-338 
with NAND gates, 88 
operation, 329-330 
positive /negative-edge triggered, 
87-88, 89, 329-330 


propagation delays, 330-331 
pseudo-random sequence generator, 
93-94, 381-383 
pulse-based /latched, 332, 338-342 
with reset and preset, 344-345 
semidynamic, 337, 341 
shift registers, 89-90, 553-556 
SPICE simulation, 648-650 
statistically low-power DFFs, 
343-344 
symbol and truth table, 9, 87 
synchronous modulo-2N counters, 
357-358 
synchronous modulo-M counters, 
362-364 
time-related parameters, 330-331 
timing diagram, 88, 331 
VHDL flip-flop inference, 508-509 
D-type latches (DLs) 
cascode voltage switch logic, 325, 326 
clocked-CMOS logic, 234 
contamination delays, 322-323 
data-stable requirements, 322-323 
dynamic DLs, 327-329 
functional analysis, 322 
list of, 324 
operation, 320-322 
positive /negative, 321 
propagation delays, 322-323 
SPICE simulation, 642-644 
static current-mode, 327 
static multiplexer-based, 324-325 
static RAM-type, 326-327 
static ratio insensitive, 325, 326 
time-related parameters, 322-323 
timing analysis, 323 
dual data rate (DDR) SDRAMs, 
444-445 
dual data rate (DDR) SRAMs, 438-439 
dual-edge D flip-flops, 342-343 
duality principle, 105, 106-107 
converting to NAND-only circuits, 
112-113 
dual-modulus prescaler, 379-380 
DUT (design under test), 601-603. See 
also simulation with VHDL 
duty cycles, 15-16 
DVDs and RS codes, 159-161 
dynamic D-type latches (DLs), 323, 324, 
327-329 
dynamic logic, 232-233 
dynamic MOS architectures, 
232-234 
dynamic power consumption, 73 
dynamic random access memory. See 
DRAM 
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E 
ECL (emitter-coupled logic), 225-226 
ECL flip flops, 333, 336, 338 
ECL latches, 324, 325, 327, 336 
EDA (electronic design automation) 
tools, 492 
EEPROM (electrically programmable 
ROM), 455-456 
effective mobility, MOSFETs, 202 
encoders 
biphase/Manchester codes, 139-140 
bipolar codes, 139 
convolutional codes, 163-167 
cyclic redundancy check codes, 
155-156 
error detecting / correcting concepts, 
153-154 
Hamming codes, 158-159 
line codes concepts, 133-136 
low density parity check codes, 
171-174 
mB/nB codes, 140-143 
MLT codes, 140 
NRC, 170 
PAM codes, 143-148 
polar codes, 138-139 
Reed-Solomon codes, 159-161 
RSC, 171 
single parity check codes, 154-155 
turbo codes, 170-171 
unipolar codes, 137-138 
Viterbi decoder, 167-169 
encoders, combinational circuits 
address encoders, 266-268 
priority encoders, 272-274 
encoding scheme, VHDL, 518 
encoding styles, FSMs, 423-426 
Encounter RTL (Cadence), 492 
energy recovery, 343-344 
entity, VHDL, 493 
enumerated data types, 498 
EPROM (electrically programmable 
ROM), 453-455 
EP/TG-C?MOS flip flops, 333, 339, 340 
equivalent circuits, Boolean algebra, 
107 
error detecting / correcting 
audio CD encoding, 159-161 
burst codes, 159 
CIRC, 160 
codes, concepts, 153-154 
convolutional codes, 163-167 
cyclic redundancy check (CRC) 
codes, 155-156 
Hamming codes, 156-159 
interleaving, 161-162 
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low density parity check (LDPC) 
codes, 171-174 
nonsystematic codes, 157 
parity-check equations, 155, 
157-158, 172 
Reed-Solomon codes, 159-161 
single parity check (SPC) codes, 
154-155 
systematic codes, 154 
turbo codes, 170-171 
Viterbi decoder, 167-169 
Ethernet applications, 134-135 
ETOX flash memory, 456-458 
E-TSPC flip flops, 333, 335, 336, 337 
Euclidean distances, 144-146 
evaluation/ precharge phases, dynamic 
logic, 232 
even parity, 82, 287 
event-related attributes, VHDL, 501 
explicit-pulsed (EP), 339 
extension, sign, 29-30, 51-52 


F 
Fairchild Camera, 2-3 
Fairchild Semiconductor, 2-3 
fall time, 17, 190 
fan-in/ fan-out, 224, 229 
Fano algorithm, 167 
fast adders, 293-300 
approaches for, 294-295 
carry-lookahead adders, 297-300, 
540-543 
carry-select adders, 297 
carry-skip adders, 296 
four-bit carry-lookahead circuit, 
299-300 
generate, propagate, and kill signals, 
293-294 
Manchester carry-chain adders, 
295-296 
faster carry propagation/ 
generation, 295 
FBGA (fine-pitch ball grid array), 11, 12 
FBGA Flip-Chip, 11, 12 
Fibonacci, Leonardo, 561 
Fibonacci series generators, 561-562 
field programmable gate arrays. See 
FPGAs 
field-effect transistor. See MOSFETs 
finite state machines. See FSMs; state 
machines 
fixed ions and semiconductors, 181, 197 
flash memory 
ETOX cells, 456-457 
ETOX programming, 457-458 
multibit, 460-461 


NOR vs. NAND, 459-460, 461 
SONOS cells, 458 
specifications, 461 
split-gate cells, 458 
flip-flops 
DFFs. See D-type flip-flops 
JK flip-flop, 319, 348 
SR flip-flop, 319, 348 
T flip-flops, 345-347 
types of, 87 
floating-point (FP) operations, 59-62 
floating-point (FP) representation 
biased /actual exponent, 30-31, 59 
conversion examples, 32-33 
double-precision floating-point, 
31-32, 59 
hypothetical floating-point system, 
34-35 
TEEE 754 standard, 30-33, 59 
vs. integer, 33-35 
normalized scientific notation, 30-31 
precision, 33 
single-precision floating-point, 
30-33, 59 
footed dynamic logic, 233 
Fourier analysis, 622 
FPGA Advantage (Mentor Graphics), 
492 
FPGAs (field programmable gate 
arrays) 
additional features, 485 
architecture, 479-480 
clock management, 483-485 
concepts, 15, 18-19 
DSP blocks, 482-483 
I/O standards, 485 
RAM blocks, 481-482 
Stratix LAB and ALM, 481 
summary and comparison, 485-486 
technology, 478-479 
Virtex CLB and Slice, 480 
FR-4 (flame resistant category 4), 13 
FRAM (ferroelectric RAM), 462-463 
frames, audio CD encoding, 161 
free charge and semiconductors, 181, 197 
free hole and semiconductors, 182, 198 
frequency dividers 
circuits with multiple dividers, 
376-377 
divide-by-2, 374 
divide-by-9 with symmetric phase, 
375-376 
divide-by-M with symmetric phase, 
374-376 
D-type flip-flops, 89 
introduction, 9, 374 


prescalers, 377, 379-381 
symmetric-phase design, 421-423 
timers, 376-377 
frequency meters, 562-565 
frequency response, 191, 209, 622, 
634, 644 
FSMs (finite state machines) 
basic FSM, 399-401, 518-519 
car alarm, VHDL, 578-587 
circuit with synchronized machines, 
418-419 
with complex combinational logic, 
414-416 
convolutional codes, 164-166 
counter 0-to-7, 402-404 
counter 3-to-9, 404-406 
counter Gray-encoded 425-426 
counter one-hot encoded 424-425 
design techniques, 399 
divide-by-5 with symmetric phase, 
422-423 
encoding styles, 423-426 
generic signal generator design 
technique, 419-421 
Gray encoding, 424, 425-426 
large FSM design, 411-414 
large-window signal generator, 
412-414 
LCD driver, VHDL, 588-597 
Mealy vs. Moore machines, 399 
model, 397-399 
multi-machine designs, 417-419 
one-hot encoding, 424-425 
pulse width modulator, 415-416 
sequential binary encoding, 423-424 
signal generators, 408-410, 412-414 
smallest and simplest FSM, 401-402 
state transition diagram, 164, 397, 516 
string detectors, 406-408 
string detectors, VHDL, 573-575 
symmetric-phase frequency dividers, 
421-423 
synchronism, 417 
synchronous 3-to-9 counter, 
404-406 
system resolution and glitches, 
410-411 
two-hot encoding, 424 
“universal” signal generator, VHDL, 
575-578 
VHDL template, 516-519 
full-adder (FA) units, 290-291 
full-bench timing analysis, 602, 615 
full-scale counters, 91, 355 
functional timing diagrams, 75 
functions 
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Boolean, 103-105 
VHDL, 513-514 


G 
GaAs (gallium arsenide), 2, 182, 198 
GaAs-AlGaAs HBT, 193-194 
GAL (generic PAL) devices, 471 
Gallager, Robert, 171 
garage door controller, 430-431 
gates 
Boolean functions, 103-104 
buffer logic gates, 8 
compound, and combinational logic 
circuits, 259-262 
digital, described, 8 
gate-level digital circuits, 6-7 
gate-level vs. transistor-level 
analysis, 20 
AND gates, 8, 77-78, 258, 259 
MOSFETs, 200 
NAND gates, 8, 77-78, 258 
NOR gates, 8, 79-80, 259 
OR gates, 8, 79-80, 259 
XOR gates, 8, 81-83, 259 
Gauss-Jordan transformation, 158 
GCFF flip flops, 333, 343 
Ge (germanium), 2, 182, 198 
GENERATE statement. VHDL, 502-503 
generator matrix, 158 
generator polynomial and CRC codes, 
155-156 
GENERIC MAP declaration, VHDL, 512 
GF-TSPC flip flops, 333, 336, 337 
glitches 
propagation delays, 123-124 
system resolution, 411 
golden ratio, 561 
Gray code, 24, 25 
Gray encoding for FSMs, 424, 425-426 
Greek numbers, 37 
group generate / propagate, 294, 298 
Grove-Frohman MOSFET model, 
SPICE, 628 


H 
Hamming codes, 156-159 
encoding-decoding procedure, 
158-159 
generator matrix, 158 
minimum Hamming distance, 157 
parity-check equations, 157-158 
parity-check matrix, 158 
rate, 157 
syndrome decoding procedure, 
158-159 
systematic/nonsystematic codes, 157 


Hamming distances, 146 
hard decisions 
LDPC decoders, 168 
Viterbi decoders, 168 

hardware description language, 19, 492 

hardware programmable ICs, 15, 18-19 

HBT (heterojunction bipolar transistor), 
193-194 

HC/HCT CMOS families, 227-228 

HD44780U (Hitachi) LCD controller, 
588-591 

hexadecimal code, 24, 25-26 

hexadecimal-to-decimal example, 24 

high-to-low propagation delays, 17, 
123-124 

HLFF/Partovi flip flops, 333, 339, 
340, 341 

HSpice (Synopsys), 621. See also 
simulation with SPICE 

HSTL (high-speed transceiver logic) 
standards, 244 


I 
TEEE 754 standard, 30-33, 59 
ieee library, 496 
IF statement, VHDL, 504 
implementation complexity, 
line codes, 136 
implicants and SOP, 109 
implicit-pulsed (IP), 339 
inactivity factor, 392, 563 
incrementer, 303, 304 
independent DC/AC sources, SPICE, 
630-634 
industrial temperature range, TTL 
chips, 222 
input/output. See I/O standards 
integers 
codes for, 21-26 
floating-point vs., 33-35 
negative, binary representation, 
26-30 
VHDL data types, 496-498 
integrated circuits (ICs) 
concepts, 10-11 
examples, 12 
levels of abstraction, 7 
programmability, 15 
Intel Corporation, historical notes, 3 
Intel PCI Express, 246-247 
interleaving, 161-162 
audio CD encoding, 160 
block interleaver, 162 
compared to scrambling, 162 
pseudo-random interleaver, 162 
interpolation, 161 


685 


inversion functions, 6, 8, 103 
inverted logic, 8, 267, 284 
inverters, 8 
Boolean functions, 103-104 
DC response, 186-189 
nMOS/pMOS operation, 70, 216, 217 
summary, 258, 259 
inverters, CMOS 
CMOS logic, 72-73, 206 
concepts, 258, 259 
function and properties, 71-72 
logic voltages, 75 
MOSFETs, 205-207 
n-well, 205 
parameters, 206-207 
power consumption, 73-74 
power-delay product, 74-75, 208 
SPICE simulation, 639-642 
switching frequency, 74 
symbol and truth table, 72 
timing diagrams for combinational 
circuits, 75-77, 78-79 
transition voltage, 206 
I/O buses and synchronous RAM 
design, VHDL, 532-535 
I/O standards 
CMOS, 237 
compared to logic families, 235 
concepts, 235-236 
differential, 235-236 
HSTL, 244 
LVCMOS, 237-240 
LVDS, 244-247 
LVTTL, 236-237 
single-ended, 235-236 
single-ended voltage-referenced 
terminated, 235-236 
SSTL, 240-244 
TTL, 236 
IP-SRAM flip flops, 333, 342 
irreducible SOP/POS. See POS; SOP 
irregular waveforms for simulations, 
603-605 
irregular/regular LDPC codes, 171 
ISE (Xilinx), 492 
Itanium-based flip flops, 333, 
340, 341 
LV characteristics for bipolar 
transistors, 184-185 
collector signals, 184 
cutoff/active/saturation regions, 
184-185 
I-V characteristics for MOSFETs, 
201-202 
effective mobility, 202 
saturation curve, 201-202 
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saturation/linear /subthreshold 
regions, 201 


J 
jamb latch, 324, 325, 326 
JEDEC standards 
CMOS and LVCMOS, 237-240 
HSTL, 244 
SSTL, 240-244 
TTL and LVTTL, 236 
JK flip-flop, 319, 348 
junction voltage, diodes, 220 


K 
K6 flip flops, 333, 340, 341 
Karnaugh maps 
Absorption theorem, 120 
described, 117 
essential implicants, 118 
large maps, 120-121 
minimum (irreducible) SOP, 119 
prime implicants, 118 
truth tables, 117, 121 
for zeros, 120 
Kilby, Jack, 3 
kill signals, 293-294 
KS0066U (Samsung) LCD controller, 
588-591 


L 
lanes, PCIe, 247 
latches 
DL circuits, 323-324 
DL operation, 320-322 
dynamic DLs, 327-329 
SR latch, 320 
static current-mode DLs, 327 
static multiplexer-based DLs, 
324-325 
static RAM-type DLs, 326-327 
time-related parameters, 322-323 
types of, 87 
layers 
pseudo three-layer circuit, 81 
two-layer circuit, 80 
LCDs (liquid crystal displays) 
controllers, 588-591 
SSD decoders, 266-268, 525-527, 
558-560 
with “VH” and “DL” in two 
lines, 597 
with “VHDL” blinking, 595-597 
with “VHDL” written in first line, 
591-595 
LDPC. See low density parity check 
(LDPC) codes 


leading zeros, VHDL, 505-506 emitter-coupled logic, 225-226 
least significant bit (LSB), 5, 21 HC/HCT CMOS families, 227-228 
LEDs (light emitting diodes), 266 MOS-based, 226-227 
Leonardo Spectrum (Mentor pseudo-nMOS logic, 230-231, 262 
Graphics), 492 static MOS architectures, 230-232 
level-sensitive memory circuits, 320 transmission-gate logic, 231-232 
libraries, VHDL TTL, 221-225 
declarations, 492 logic gates 
ieee library, 496 compound gates, 259-262 
packages, 495-496 concepts and circuits, 258, 259 
std library, 495 introduction, 7-8 
line codes. See also binary logic vs. physical values, 13-14 
representations logical circuits, 10 
4B/5B, 134, 140-141 logical operators, VHDL, 499 
8B/10B, 141-143 logical right rotator, 287 
4D-PAM5, 134, 143-148 logical shift 
biphase/ Manchester codes, 139-140 described, 52, 275 
bipolar codes, 139 division using, 53 
code spectrum, 135-136 multiplication using, 53-54, 55 
concepts, 134 shifters, 275-277 
convolutional encoding, 146 long words, 5 
DC component, 135 LOOP statement, VHDL, 505 
encoder-decoder pair, 134 low density parity check (LDPC) codes, 
Ethernet applications, 134-135 171-174 
Euclidean distances, 144-146 4-pass cycles (short cycles), 172 
Hamming distances, 146 brief-propagation algorithms, 173 
implementation complexity, 136 combinational designs, 173 
Manchester, 139-140 decoding algorithms, 173 
maximum run length, 136 hard/soft decisions, 173 
mB/nB codes, 140-143 information passing, 173 
MLT codes, 140 logarithms, 173 
PAM codes, 143-148 majority functions, 173 
parameters and types of, 135-137 message-passing algorithms, 173 
parity bits, 144, 146-147 parity-check matrix, 171 
polar codes, 138-139 parity-check sparcity, 171 
sublattices, 146 probabilities, 173 
transition density, 136 random designs, 173 
types of, 136-137 regular /irregular code, 171 
unipolar codes, 137-138 sum-product algorithms, 173 
use of, 133 Tanner graphs, 172 
line drivers, 85 low-to-high propagation delays, 17, 
linear region, MOSFETs, 201 123-124 
linear-feedback shift register (LFSR), low-voltage CMOS, 229-230 
93-94, 381-383 I/O standards, 237-240 
load line, 185-187, 202, 204-205 low-voltage TTL (LVTTL), 236-237 
logic families LQFP (low-profile quad flat pack), 11, 12 
BiCMOS logic, 232 LVCMOS (low-voltage CMOS), 75 
BJT-based, concepts, 219-220 LVDS (low-voltage differential 
clocked-CMOS logic, 234 signaling) standards, 244-247 
CMOS logic, 227-230 M-LVDS, 246 
CMOS-TTL interface, 228 operating modes, 246 
compared to I/O standards, 235 PCI Express bus example, 246-247 
diode-transistor logic, 220-221 
domino logic, 233-234, 262 M 
dynamic logic, 232-233 majority functions 


dynamic MOS architectures, 232-234 binary arithmetic, 47 
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binary sorter example, 274-275 
full-adder units, 290 
LDPC codes, 173 
make code and SPC codes, 154-155 
Manchester carry-chain adders, 295-296 
Manchester codes, 139-140 
master-slave D flip-flops, 331, 332-338 
classical implementations, 332-334 
clock skew and slow clock 
transitions, 334-335 
high-performance, 335-338 
maxterms 
defined, 110 
expansion and irreducible POS, 
111-112 
expansion, defined, 111 
and minterms, 111 
POS equations, 110-112 
truth tables for maxterm 
expansion, 112 
mB/nB codes, 140-143 
4B/5B codes, 140-141 
8B/10B codes, 141-143 
running disparity, 142 
Mealy machines, 399 
median functions, 47, 274-275 
memories 
CAM for cache memories, 446-447 
combinational logic circuits, 280 
digital circuits, 8-9, 10 
DRAM, 439-442 
dual and quad data rate SRAMs, 
438-439 
dual data rate SDRAMs, 444-445 
EEPROM, 455-456 
EPROM, 453-455 
flash, 456-461 
FRAM, 462-463 
MP-ROM, 452-453 
MRAM, 463-464 
next-generation nonvolatile, 461-465 
nonvolatile, 451-465 
OTP-ROM, 453 
PRAM, 464-465 
RAM blocks and FPGAs, 481-482 
ROM design, VHDL, 530-532 
SDRAM, 442-444 
SRAM, 434-437 
synchronous RAM design, VHDL, 
532-535 
types of, 433-434, 451-452 
Viterbi decoder, 168 
volatile, 433-447 
message-passing algorithms, 173 
military temperature range, 
TTL chips, 222 


minimum Hamming distance, 157, 165 
minterms 
defined, 108 
expansion and irreducible SOP, 110 
expansion and Quine-McCluskey 
algorithm, 122 
expansion, defined, 109 
and maxterms, 111 
prime implicants, 109 
SOP equations, 108-110 
truth tables for minterm expansion, 
109, 110 
MLT (multilevel transition) codes, 140 
MLT-3 codes, 140 
M-LVDS (multipoint LVDS), 246 
ModelSim (Mentor Graphics), 19, 492 
creating new projects, 664-666 
simulation procedure, 657-663 
tutorial, 657-666 
ModelSim-Altera Web Edition, 492 
modular circuit, binary sorter, 274 
modulo-2 adders, 83-84 
N-bit parity function, 83 
XOR gate, 83 
modulo-2 addition, 47 
monotonicity condition, 233 
Monte Carlo analysis, SPICE, 623, 
645-647 
Moore, Gordon, 2-3 
Moore machines, 399 
MOS transistors. See MOSFETs 
MOS-based architectures 
BiCMOS logic, 232 
clocked-CMOS logic, 234 
CMOS logic, 227-230 
concepts, 226-227 
domino logic, 233-234, 262 
dynamic architectures, 232-234 
dynamic logic, 232-233 
pseudo-nMOS logic, 230-231, 262 
static architectures, 230-232 
transmission-gate logic, 231-232, 259 
MOSFETs (metal oxide semiconductor 
field effect transistors) 
advantages, compared to BJTs, 198 
BiCMOS technologies, 211, 232 
channel width-to-length ratio, 199 
CMOS inverter, 205-207 
common-source circuit, 202—203 
compressive strain, 211 
construction, 198-199 
DC responses, 202-205 
effective mobility, 120 
introduction, 10-11, 69-71 
I-V characteristics, 201-202 
load line, 202, 204-205 
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logic. See logic families 
negative/positive charges, 199 
nMOS, 69-71 
operation, 200 
pMOS, 69-71 
semiconductors, 197-198 
SOI MOSFETs, 211 
SPICE models, 627-630 
strained layers, 210 
strained Si-SiGe MOSFETs, 210-211 
tensile strain, 211 
threshold voltage, 200 
transient responses, 207—208 
transition frequency, 209 
most significant bit (MSB), 5, 21 
MP-ROM (mask-programmed read only 
memory), 452-453 
MRAM (magnetoresistive RAM), 
463-464 
MSB reflected code, 24, 425 
multibit adders, 291-292 
multibit flash, 460-461 
multibit waveforms for simulations, 
603, 605 
multidrop LVDS, 246 
multiplexers, 268-272 
basic, 269-270 
buffered, VHDL, 494-495 
with larger inputs, 270-271 
with more inputs, 271 
mux-based dual-edge D flip-flops, 
333, 342 
NAND-based, 269-270 
static multiplexer-based DLs, 324-325 
timing diagrams, 271-272 
VHDL, 527-528 
multiplication 
as addition plus shift operations, 
55, 57 
ALU-based algorithms, 55, 57 
Boolean algebra, 103 
Booth’s algorithm, 57, 312 
floating-point, 61-62 
modified Baugh-Wooley 
multiplier, 56 
signed, 56-57 
unsigned, 54-55 
using logical shift, 53-54, 55 
using logical shift and wider 
output, 54 
multiplicative scrambler-descrambler, 
384-386 
multipoint LVDS, 246 
mutipliers, 307-312 
ALU-based unsigned/signed, 
311-312 
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parallel-serial, functional 
analysis, 310 

parallel-serial, timing analysis, 
310-311 

parallel-serial unsigned, 309-311 

signed parallel, 309 

unsigned parallel, 308-309 

mux. See multiplexers 


N 
NAND gates, 8, 77-78, 258 
Boolean functions, 103-104 
CMOS circuit, 78 
concepts, 259 
DFF implementation, 88 
NAND latches, 321, 324, 325 
NAND-only circuits, 112-113, 115-117 
NASA and Voyager spacecraft, 164 
N-bit parity function, 83 
n-channel MOS, 69-71, 199 
negation functions, 6 
negative integers, binary representation, 
26-30 
negative-level DLs, 321 
neural networks (NNs), 565-571 
implementations, 566-567 
VHDL code, 567-571 
nibbles, 5 
nMOS inverters, 70, 216, 217 
nMOS transistors, 69-71 
Nobel Prize in Physics, 2,3 
noise analysis, SPICE, 623 
noise margins 
CMOS logic, 229 
high/low, 75 
TIL, 22 
nominal mapping, VHDL, 510 
nonoverlapping clock generators, 277-278 
nonprogrammable ICs, 15 
nonrecursive scramblers-descramblers, 
383 
nonsystematic codes, 157 
nonsystematic solutions, 157 
nonvolatile memory. See memories 
NOR gates, 8, 79-80, 259 
Boolean functions, 103-104 
normalized scientific notation, 30-31 
NOT functions, 6, 8 
NOT logic gates, 8 
Noyce, Robert, 2-3 
npn/pnp transistors, 183 
NRC (nonrecursive convolutional) 
encoders, 170 
NRZ (nonreturn to zero) line codes, 
138-139 
NRZ-I (NRZ invert) line codes, 138-139 


number system examples, 25-26 
n-well, 205 


Oo 
octal code, 24, 25 
odd parity function, 47, 81 
full-adders units, 290 
one-hot encoding, 424-425 
one’s complement code, 26-27 
opcode (operation-select port), 306-307 
open-drain (OD) buffers, 86-87 
operators 
Boolean algebra, 103 
VHDL, 498-500 
OR function, 6, 103-104 
OR gates, 8, 79-80, 259 
binary addition vs. OR operation, 84 
Boolean functions, 103-104 
CMOS circuit, 80 
pseudo three-layer circuit, 81 
two-layer circuit, 80 
OTP-ROM (one-time programmable 
ROM), 453 
output layer, 80 
overflow, 48-49, 50 


P 
packages, VHDL, 509-510 
PAL (programmable array logic) 
devices, 468-470 
PAM (pulse amplitude modulation) 
codes, 143-148 
parallel enable, synchronous counters, 
355-358, 367-368 
parallel logic translators, 262 
parallel signed multipliers, 309 
parallel unsigned mutipliers, 308-309 
parallel-serial unsigned multipliers, 
309-311 
parity bits 
error detecting /correcting codes, 154 
line codes, 144, 146-147 
low density parity check codes, 
171-174 
single parity check codes, 154-155 
parity detectors, 272, 502-503 
parity-check equations, 155, 
157-158, 172 
parity-check matrix, 158, 171 
parity-check sparcity, 171 
partial-scale counters, 91, 355 
p-channel MOS, 69-71, 199 
PCI Express bus example, 246-247 
PGA (pin grid array), 11, 12 
PGA2 Flip Chip, 11, 12 
physical vs. logic values, 13-14 


pins, ICs, 11-12 
pits and line codes, 138 
PLA (programmable logic array) 
devices, 470-471 
PLCC (plastic leaded chip carrier), 11, 12 
PLDs (programmable logic devices) 
concepts, 18-19, 467-468 
CPLDs, 15, 18-19, 471-478 
FPGAs, 15, 18-19, 478-486 
SPLDs, 468-471 
PLL (phased lock loop) circuits, 
377-381 
basic PLL, 378-379 
prescalers, 379-381 
programmable, 381 
pMOS transistors, 69-71 
pnp/npn transistors, 183 
point-contact transistors, 1-2 
point-to-point half-duplex LVDS, 246 
point-to-point simplex LVDS, 246 
polar codes, 138-139 
polysilicon-emitter BJT, 192-193 
POS (product-of-sums) 
complemented form, 120 
concepts, 80, 111 
maxterm expansion and irreducible 
POS, 111-112 
maxterms and POS equations, 
110-112 
POS-based CMOS circuit, 260-262 
standard circuits for SOP and POS 
equations, 112-117 
positional code, 21 
positional mapping, VHDL, 510-511, 
541, 554 
position-related attributes, VHDL, 501 
positive-edge triggered DFFs, 87-88, 89, 
342-343 
positive-level DLs, 321 
power consumption 
capacitive, 73 
inverter and CMOS logic, 73-74, 230 
short-circuit, 73-74 
static/dynamic, 73 
power-delay (PD) product, 74-75, 
208, 230 
PQFP (plastic quad flat pack), 11, 12 
PRAM (phase-change RAM), 464-465 
precharge/evaluation phases, dynamic 
logic, 232 
precision and FP representation, 33 
prescalers, 377, 379-381 
preset, DFFs, 344-345 
prime implicants 
Karnaugh maps, 118 
minterms, 109 
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printed circuit boards (PCBs) 
concepts, 11, 13 
traces, 13 
priority encoders, 272-274, 529 
procedures, VHDL, 514-516 
programmable ICs, 15 
programmable logic devices. See PLDs 
programmable PLL (phased lock 
loop), 381 
propagate signals, 293-294 
propagation delays 
D-type flip-flops, 330-331 
D-type latches, 322-323 
glitches, 123-124 
high-to-low, 17, 123-124 
Karnaugh maps, 123-124 
low-to-high, 17, 123-124 
parallel-serial multiplier, 310-311 
transient responses, 17-18, 189-191 
PS/2 devices, 21, 154-155 
pseudo-morphic materials, 194 
pseudo-nMOS logic, 230-231 
pseudo-random interleaver, 162 
pseudo-random sequence generators, 
93-94, 381-383 
PSpice (Cadence), 621 
SPICE simulation with coded input, 
667-669 
SPICE simulation with graphic input, 
670-671 
PT (pass transistor), 231, 258, 259 
pulse response. See transient responses 
pulse-based dual-edge D flip-flops, 342 
pulse-based /latched D flip-flops, 
338-342 
implementations, 339-342 
short-pulse generators, 278, 338-339 
pulsed latch. See pulse-based/latched D 
flip-flops 
PWM (pulse width modulator), 390-391, 
415-416 


Q 

quad data rate (QDR) SRAMs, 438-439 
Quartus II Web Edition, 19, 492 
Quine-McCluskey algorithm, 121-122 
quotation marks, VHDL syntax, 5, 22 


R 

RAM. See memories 

randomization 
LDPC random designs, 173 
pseudo-random interleaver, 162 
pseudo-random sequence generator, 

93-94, 381-383 
range of values, 14 


range-related attributes, VHDL, 
500-501 
reading memory 
DRAM, 439-440 
SRAM, 434-435 
read-only memory (ROM). See memories 
receive/transmit data, 99 
reciprocals, 27 
recursive inputs and turbo codes, 170 
redundancy 
bits. See parity bits 
cyclic redundancy check codes, 
155-156 
Reed-Solomon (RS) codes, 159-161. 
See also audio CD encoding 
CIRC, 160 
concatenated RSV decoder, 167 
symbols, 159 
reference physical values, 13 
refresh, DRAM, 441 
registers 
clocked, 9 
D flip-flop control ports, 344-345 
D flip-flops, 9, 87-89, 329-332 
D latch, 320-329 
digital, described, 8-9 
dual-edge D flip-flops, 342-343 
JK flip-flop, 319, 348 
level-sensitive memory circuits, 
320 
master-slave D flip-flops, 331, 
332-338 
pulse-based D flip-flops, 338-342 
sequential vs. combinational logic, 
319-320 
shift register, 89-90, 553-556 
SR flip-flop, 319, 348 
SR latch, 320 
statistically low-power D flip-flops, 
343-344 
T flip-flops, 347 
regular waveforms for simulations, 
603-605 
regular/irregular LDPC codes, 171 
reset, DFFs, 344-345 
resistive-capacitive (RC) circuit, SPICE, 
644-645 
resolution, system, 410-411 
responses. See circuit responses 
rise time, 17, 190 
ROM. See memories 
rotation (circular shift) 
described, 53, 275 
shifters, 275-277 
RSC (recursive systematic 
convolutional) encoders, 171 
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running (accumulated) disparity and 
line codes, 142 

running simulations, 659-660 

RZ (return to zero) line codes, 138-139 


S 
SAFF flip flops, 333, 336, 337-338 
sample and hold (S&H) circuit, 4 
sampling rate 
A/D conversion, 5 
audio CD encoding, 161 
saturation buildup, 190 
saturation curve, MOSFETs, 201-202 
saturation, transistors, 184-187, 201-202 
Scan Code Set 2, 21, 154 
Schmitt triggers (STs), 279-280 
Schottky (clamp) diode, 189-190, 223 
Schottky TTL, 223 
SCL flip flops, 333, 336, 338 
SCL latches, 324, 325 
scramblers/descramblers, 383-386 
additive (synchronous), 383-384 
introduction, 134, 143 
multiplicative (asynchronous), 
384-386 
pseudo-random sequence generator, 
381-383 
scrambling compared to 
interleaving, 162 
S-CVSL latches, 324, 325 
SDFF/Klass flip flops, 333, 340, 341 
SDRAM (synchronous DRAM), 
442-444 
semiconductors 
bipolar transistors, 181-182 
materials, 182, 198 
MOS transistors, 197-198 
negative/positive charges, 2, 181-182, 
197-198 
relaxed layers, 210 
straining, 194, 210 
semidynamic DFFs, 337, 341 
sense amplifier 
DRAM, 441-442 
SRAM, 436-437 
sensitivity / worst-case analysis, 
SPICE, 623 
sequential binary encoding, 423-424 
sequential circuits. See also 
combinational arithmetic 
circuits; combinational logic 
circuits 
asynchronous counters, 368-371 
bit-serial adders, 300-301 
vs. combinational circuits, 10, 257-258 
described, 87 
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Fibonacci series generators, VHDL, 
561-562 
finite state machines. See FSMs 
frequency dividers, 374-377 
frequency meters, VHDL, 562-565 
neural networks, VHDL, 565-571 
PLL and prescalers, 377-381 
propagation delays and glitches, 
123-124 
pseudo-random sequence generators, 
93-94, 381-383 
registers, 319-320. See also registers 
scramblers/descramblers, 383-386 
shift register with load, VHDL, 
553-556 
shift registers, 353-355 
signal generators, 371-374 
state machines. See state machines 
switch debouncer, VHDL, 556-558 
synchronous counters, 355-368 
system resolution and glitches, 
410-411 
timers, VHDL, 558-560 
timing diagrams, 88-89 
sequential code, VHDL 
vs. concurrent code, 493-494, 501 
IF, CASE, LOOP, and WAIT 
statements, 503-506 
sequential decoders, 167 
serial bits and convolutional codes, 163 
serial enable, synchronous counters, 
355-358, 367-368 
Shannon’s theorem, 105, 106 
Shichman-Hodges MOSFET model, 
SPICE, 627 
shift operations 
division/multiplication with, 53-54 
shifters, 275-277 
types of, 52-53, 275 
shift operators, VHDL, 499 
shift registers (SRs) 
concepts, 89-90, 353-355 
convolutional code implementation, 
163 
Hamming code implementation, 159 
with load, VHDL design, 553-556 
timing diagram, 90 
shift_integer function, VHDL, 513-514 
shifters, 275-277 
Shockley Semiconductor, 2 
Shockley, William, 1-3 
short-circuit power consumption, 
73-74 
shortened RS codes, 160 
short-pulse generators, 278, 338-339 
Si (silicon), 2, 181-182, 197-198 


sign extension, 29-30, 51-52 
signal generators 
concepts, 374-375 
dual-edge, 375 
four-window, 373-374 
FSM design, 408-410 
generic design technique, 419-421 
large-window FSM design, 412-414 
programmable, 390 
single-edge, 372 
two-window, 372-373 
“universal”, VHDL, 575-578 
SIGNAL object, VHDL, 507 
signal voltages 
CMOS logic, 229 
TTL, 224-225 
signals 
continuous-valued vs. discrete- 
valued, 4 
physical, vs. logic values, 13-14 
signed adders /subtractors, 301-303, 
543-544 
signed comparators, 304-305 
signed number operations 
addition/subtraction, 49-52 
Booth’s algorithm, 57, 312 
division, 58-59 
modified Baugh-Wooley 
multiplier, 56 
multiplication, 56-57 
shift operations, 53-54 
signed adders/subtractors, 
301-303 
significand, 31 
sign-magnitude code, 26 
decimal range, 26 
Silicon Valley (California), 2-3 
simulation with SPICE 
AC response of RC circuits, 644-645 
AC source with sinusoidal sweep, 
634, 644 
basic structure of SPICE code, 
623-625 
capacitor declarations, 625 
DC analysis, 623 
DC response of CMOS inverters, 
639-641 
DC response of diode-resistor 
circuits, 638-639 
DC sweep, 631 
dependent source declarations, 635 
DFFs with subcircuits, 648-650 
diode declarations, 626 
exponential AC source, 633 
frequency-modulated AC source, 
633-634 


independent AC source declarations, 
631-634 
independent DC source declarations, 
630-631 
inductor declarations, 625-626 
inputs, 636-637 
Monte Carlo analysis, 623, 645-646 
Monte Carlo analysis of DR circuits, 
646-647 
MOSFET declarations, 627-630 
outputs, 637-638 
overview, 19-20, 621-622 
parametric analysis, 639 
piecewise linear AC source, 632 
PSpice tutorial, 667-671 
pulsed AC source, 631-632 
resistor declarations, 625 
schematics capture component, 
622, 636 
sinusoidal AC source, 632-633 
subcircuits, 648-650 
transient response of CMOS 
inverters, 641-642 
transient response of D-type latches, 
642-644 
types of simulations, 622-623 
simulation with VHDL 
adders, 611-612, 614-615 
clock dividers, 608-611, 613-614, 
616-617 
ModelSim tutorial, 657-666 
overview, 19 
stimulus generation and waveforms, 
603-605 
synthesis vs. simulation, 492, 601-602 
testbench templates, 607 
testbench testing, 606 
testbench types, 602-603 
testing stimuli, 605-607 
writing Type I testbenches, 607-612 
writing Type II testbenches, 612-615 
writing Type III testbenches, 615 
writing Type IV testbenches, 615-617 
single parity check (SPC) codes, 
154-155 
single-precision floating-point, 
30-33, 59 
single-pulse waveforms for simulations, 
603-605 
sinusoidal global clock, 343 
Si-SiGe HBT, 193-194 
Si-SiGe MOSFETs, strained, 210-211 
skew, clock, 277-278, 333-335 
slave DFFs. See master-slave D flip-flops 
small-signal model, 191, 209 
SMD (surface mount device), 11 
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soft decisions 
LDPC decoders, 168 
Viterbi decoders, 168 
SOI MOSFETs, 211 
SONOS cells, 458 
SOP (sum-of-products) 
complement of, 120 
concepts, 80, 104 
Karnaugh maps and irreducible 
SOP, 119 
minterm expansion and irreducible 
SOP, 110 
minterms and SOP equations, 108-110 
SOP-based CMOS circuit, 260 
standard circuits for SOP and POS 
equations, 112-117 
sort_data procedure, VHDL, 515-516 
source, MOSFETS, 198 
SPICE (Simulation Program with 
Integrated Circuit Emphasis). 
See simulation with SPICE 
SPLDs (simple programmable logic 
devices), 468-471 
GAL devices, 471 
PAL devices, 468-470 
PLA devices, 470-471 
split-gate cells, 458 
SR (set-reset ) flip-flop, 319, 348 
SR (set-reset ) latches, 320 
SRAM (static random access memory) 
chip architecture, 436 
circuits, 434 
dual and quad data rate, 438-439 
memory-read, 434-435 
memory-write, 435-436 
sense amplifier, 436-437 
SRAM latches, 324, 325 
SRIS (static ratio insensitive) latches, 
324, 325, 326 
SRs. See shift registers 
SSDs (seven-segment displays), 266-268, 
525-527, 558-560 
SSTC flip flops, 333, 336, 342, 343 
SSTC1/2 latches, 321, 324, 325, 327, 328 
S-STG latch, 324, 325, 326 
SSTL (stub series terminated logic) 
standards, 240-244 
classI/II circuits, 241-242 
differential amplifier, 240 
termination resistors, 241 
state machines. See FSMs 
state transition diagram, 164, 397, 516 
static DL circuits, 323 
static D-type latches (DLs) 
current-mode, 327 
list of, 324 


multiplexer-based, 324-325 
RAM-type, 326-327 
static MOS architectures, 230-232 
static power consumption, 73 
static RAM. See SRAM 
staticizers, 326, 341 
statistically low-power DFFs, 343-344 
std library, 495 
steady-state voltages/currents, 622 
STG (static transmission-gate-based) 
flip flop, 333, 334 
STG (static transmission-gate-based) 
latch, 324, 325, 326 
stimulus generation for VHDL 
simulations, 603-605 
stimulus-only functional analysis, 
602, 607 
stimulus-only timing analysis, 602, 612 
storage delay time, 190 
strained Si-SiGe MOSFETs, 210-211 
straining, semiconductor, 194, 210 
Stratix II features, 485-486 
Stratix LAB and ALM, 481 
string detectors, 406-408, 573-575 
StrongARM flip flops, 333, 336, 337 
subcoding bytes and RS codes, 160 
sublattices, 146 
subsystem-level digital circuits, 7 
subthreshold regions, MOSFETs, 201 
subtraction 
borrow, 49 
floating-point, 59-61 
MSB check for signed, 51 
with N-bit inputs /(N+1)-bit output 
(signed), 51-52 
with N-bit inputs /output (signed), 
50-51 
signed, 49-52 
truth tables, 49 
subtractors, 301-303, 543-544 
sum and binary addition, 27, 47-48 
sum-product algorithms, 173 
supply voltages, 14, 75 
CMOS logic, 229, 237-240 
TTL, 224-225, 236-237 
surrogate pairs and Unicode, 38 
switch debouncer, 431, 556-558, 581-587 
switches, 258, 259 
switching frequency, 74 
symmetric-phase frequency dividers, 
421-423 
synchronism and FSMs, 417 
synchronous counters, 91-92, 355-368, 
404406, 424-426 
0-to-7 counter, 402-404 
0-to-9 counters, 360-364 
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3-to-9 counters, 365-367 
circuit and timing diagram, 91 
concepts, 355 
DFF-based, modulo-2N, 357-358 
DFF-based, modulo-M, 362-364 
with four DFFs, 365 
large, 367-368 
with nonzero initial states, 364-376 
with parallel enable, 355-358, 367-368 
with regular DFFs, 362 
with regular TFFs, 360 
with serial enable, 355-358, 367-368 
TFF-based, modulo-2N, 355-357 
TFF-based, modulo-M, 358-361 
with three DFFs, 366-367 
using DFFs with clear, 362-364 
using TFFs with clear, 361 
synchronous scramblers-descramblers, 
383 
syndrome decoding procedure, 158-159 
Synplify Pro (Synplicity), 492 
synthesis vs. simulation, VHDL, 19, 492, 
601-602 
systematic solutions, 154, 157, 170, 373 
system-level digital circuits, 6-7 


T 
T flip-flops (TFFs), 345-347 
asynchronous counters, 346-347 
synchronous modulo-2N counters, 
355-357 
synchronous modulo-M counters, 
358-361 
Tanner graphs, 172 
tapped delay line, 354-355 
technology node, 11 
temperature ranges, TTL, 222 
tensile strain, 211 
termination resistors, 241 
test file, VHDL, 601-602, 606 
testbenches. See simulation with VHDL 
Texas Instruments, 3 
TG (transmission-gate) 
logic, 231-232, 259 
switches, 258, 259 
TG-C°MOS flip flop, 333, 334, 339, 340 
TG-C’MOS latch, 324, 325, 339 
three-state buffers. See tri-state buffers 
threshold voltage, MOSFETs, 200 
TIA/EIA-644-A (Telecommunications 
Industry Association /Electronic 
Industries Alliance no. 644-A) 
standard, 244 
time delays 
binary waveforms, 15-16 
transient responses, 190 
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time response, 17, 190, 622 
timers 
circuits with multiple dividers, 376-377 
VHDL design, 558-560 
timing diagrams 
address-decoder, 265-266 
binary waveforms, 15-16 
for combinational circuits, 75-77, 
78-79 
for multiplexers, 271-272 
for parallel-serial multipliers, 310-311 
for sequential circuits, 88-89 
toggle flip-flops. See T flip-flops 
traces, PCB, 13 
transconductance, 191, 209 
transient responses 
bipolar transistors, 189-191 
introduction, 17-18 
MOSFETs, 207-208 
saturation buildup, 190 
Schottky (clamp) diode, 189-190 
SPICE analysis, 622 
SPICE simulation, 641-644 
time response, 17, 190, 622 
transistors 
AC responses, 18, 191-192, 209 
bipolar. See bipolar transistors 
BJT. See BJTs 
commercial, and transition 
frequency, 190 
DC responses, 17, 185-189 
diode-transistor logic, 188-189, 
220-221 
emitter, collector, and base, 1, 183 
field-effect. See MOSFETs 
gate-level vs. transistor-level 
analysis, 20 
HBT, 193-194 
historical notes, 1-3 
I-V characteristics, 184-185 
load line, 185-187, 202, 204-205 
MOSFET. See MOSFETs 
polysilicon-emitter BJT, 192-193 
saturation, 184-187 
semiconductors, 181-182, 197-198 
transient responses, 17-18, 189-191 
transistor-level digital circuits, 6-7 
TTL, 221-225 
transition density, line codes, 136 
transition frequency 
BJTs, 190, 191-192 
of commercial transistors, 191 
MOSFETs, 209 
transition voltage, 17, 206 
transmit/receive data, 99 
trellis diagrams, 165-166 


triggers, 279-280 
triode region, MOSFETs, 201 
tri-state buffers, 85-86, 258 
buffered multiplexers, VHDL, 494-495 
bus drivers, 86 
concepts, 259 
symbol, truth table, and 
implementation, 85 
truncation 
floating-point operations, 60, 61-62 
two’s complement example, 29-30 
truth tables 
concepts, 7-9 
types of, 108 
TSPC flip flops, 333, 336 
TSPC latches, 324, 328 
TTL (transistor-transistor logic) 
circuits, 221-222 
CMOS-TTL interface, 228 
fan-in/ fan-out, 224 
I/O standards, 236 
noise margins, 225 
supply voltage/signal voltages, 
224-225, 236-137 
temperature ranges, 222 
versions, 223-224 
turbo codes, 170-171 
Bahl algorithm, 171 
parallel RSCs, 171 
recursive inputs, 170 
turn-off delays, 17 
turn-on delays, 17 
two-hot encoding, 424 
two’s complement code, 28-30 
extension and truncation example, 
29-30 
signed subtraction, 50 
signed/unsigned decimals 
example, 29 
two’s complementer, 303-304 


U 
UDC (unit-distance code), 24 
Unicode 
base multilingual plane, 38 
characters, 36 
surrogate pairs, 38 
UTF-16 encoding, 37, 38-39 
UTF-32 encoding, 37, 39 
UTF-8 encoding, 36-38 
Windows Glyph List 4, 36 
unipolar codes, 137-138 
unsigned adders/subtractors, 301-303, 
543-544 
unsigned number operations 
addition, 27-28, 47-49 


division, 57-58 
multiplication, 54-55 
shift operations, 53-54 
UTF-16 encoding, 37, 38-39 
UTF-32 encoding, 37, 39 
UTF-8 encoding, 36-38 
UTP (unshielded twisted pair), 134-135 


Vv 
VARIABLE object, VHDL, 507 
vertically reflected circuits, 261 
VHDL 
adders/subtractors and INTEGER, 
543-544 
adders/subtractors and 
STD_LOGIC_VECTOR, 544 
address decoder, 523-525 
architecture, 493, 493-495 
arithmetic-logic unit, 547-549 
attributes, 500-501 
BCD-to-SSD converter, 525-527 
bit/bit vector syntax, 5, 22 
car alarm, 578-587 
carry-lookahead adders, 540-543 
carry-ripple adders, 539-540 
carry-ripple adders with COMPONENT, 
511-512 
CASE statement, 504, 517-519 
circuit synthesis and simulation with, 
19, 492 
code structure, 492-495 
combinational arithmetic circuits, 
539-549 
combinational logic circuits, 523-535 
concurrent code statements, 502-503 
concurrent vs. sequential code, 
493-494, 501 
CONSTANT, SIGNAL, and VARIABLE 
objects, 506-509 
conventional code, 510 
counting leading zeros, 505-506 
declarations, 510 
entity, 493 
Fibonacci series generators, 561-562 
frequency meters, 562-565 
functions, 513-514 
GENERATE statement, 502-503, 512 
GENERIC MAP declaration, 512 
GENERIC statement, 493, 501, 503, 506 
IF statement, 504, 508 
instantiation, 510 
introduction, 492 
LCD driver, 588-597 
library declarations, 492 
LOOP statement, 505-506 
multiplexer, 527-528 
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multipliers /dividers with INTEGER, 
545-546 

multipliers / dividers with 
STD_LOGIC_VECTOR, 547 

neural networks, 565-571 

operators, 498-500 

packages, 509-510 

packages and libraries, 495-496 

parity detectors, 502-503 

predefined data types, 496-497 

priority encoder, 529 

procedures, 514-516 

ROM memory design, 530-532 

sequential circuits, 533-571 

sequential code statements, 503-506 

shift register with load, 553-556 

shift_integer function, 513-514 

signed and unsigned adders/ 
subtractors, 543-544 

signed and unsigned multipliers / 
dividers, 545-546 

sort_data procedure, 515-516 

state machines, 516-519, 573-597 

string detectors, 573-575 

subprograms, 493, 501, 503 

switch debouncer, 556-558 


synchronous RAM memory design, 
532-535 
template for FSMs, 516-519 
timers, 558-560 
“universal” signal generator, 575-578 
user-defined data types, 498 
WAIT statement, 504 
WHEN statement, VHDL, 502-503, 
523-525 
VHSIC hardware description language. 
See VHDL 
Virtex 5, features, 485-486 
Virtex CLB and Slice, 480 
Viterbi decoder, 167-169 
delays, 168 
depth, 168 
hard /soft decisions, 168 
memory needed, 168 
volatile memory. See memories 


Ww 
WAIT statement, VHDL, 504 
waveforms. See also timing diagrams 
binary, 15-16 
generation for VHDL simulations, 
603-605 


WHEN statement, VHDL, 502-503 
Windows Glyph List 4, 36 
words, 5, 21 
writing to memory 

DRAM, 440 

SRAM, 435-436 


x 
Xilinx CPLDs, 475-477 
Xilinx Virtex 5, 18-19 
XNOR gates, 8, 81-83, 259 
CMOS circuit, 82 
even parity function, 82 
XOR gates, 8, 81-83, 259 
CMOS circuit, 82 
modulo-2 adders, 83 
odd parity function, 81 


Z 
zeros 
Karnaugh maps for, 120 
leading zeros, VHDL, 505-506 
nonreturn to zero line codes, 
138-139 
return to zero line codes, 138-139 
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